In [None]:
#| hide
from sleep_state_detection.core import *

# sleep_state_detection

> Determining people's sleep state from wrist-worn accelerometer data

## Problem: We need to efficiently track sleep states 

Sleep is very important to human health. In order for researchers to properly study sleep they need to be able to accurately measure when people fall asleep and wake up. Research has been challenging "due to the lack of naturalistic data capture alongside accurate annotation. If data science could help researchers better analyze wrist-worn accelerometer data for sleep monitoring, sleep experts could more easily conduct large-scale studies of sleep, thus improving the understanding of sleep's importance and function." (Esper et al., 2023). Consumers could also benefit by being able to track their own sleep habits with a cheap wearable device.

## Deliverables
 * A model that takes in wrist-worn accelerometer data and predicts when sleep onset and wakeup times are located.
 * A report and slide deck outlining process, outcomes, and recommendations

## Stakeholders
Researchers who study sleep and companies who would like to use accelerometer data to track sleep to help people improve their health.


## Proposed solution: 
A deep learning model that takes in accelerometer data and outputs predicted "onset" and "wakeup" events along with confidence scores between 0 and 1. We treat this as a segmentation problem, segmenting out the sleep period from the awake period while also predicting the transition time. 

Baseline solution started with code from this repo under an MIT license: https://github.com/tubo213/kaggle-child-mind-institute-detect-sleep-states/tree/main

### Recommendations
1) Use these predictions to automatically annotate sleep onset and wake-up times.
2) Utilize these annotations in the context of studying sleep and its overall health effects.
3) Apply the annotations in a business context to help individuals understand their own sleep patterns and how they affect their health. 

## Methodology overview 

### Model description

 * **Model inputs** Multi-day accelerometer data in 5 second steps.can.

 * **Model outputs** A list of time steps and probabilities that they contain an "onset" or "wakeup" events
 
### Metric
Submissions are evaluated on the average precision of detected events, averaged over timestamp error tolerance thresholds, averaged over event classes.

Detections are matched to ground-truth events within error tolerances, with ambiguities resolved in order of decreasing confidence. For both event classes, we use error tolerance thresholds of 1, 3, 5, 7.5, 10, 12.5, 15, 20, 25, 30 in minutes, or 12, 36, 60, 90, 120, 150, 180, 240, 300, 360 in steps.

#### **Detailed Description**

**Evaluation proceeds in three steps:**

**Assignment** - Predicted events are matched with ground-truth events.

**Scoring** - Each group of predictions is scored against its corresponding group of ground-truth events via Average Precision.

**Reduction** - The multiple AP scores are averaged to produce a single overall score.

**Assignment**

For each set of predictions and ground-truths within the same event x tolerance x series_id group, we match each ground-truth to the highest-confidence unmatched prediction occurring within the allowed tolerance.

Some ground-truths may not be matched to a prediction and some predictions may not be matched to a ground-truth. They will still be accounted for in the scoring, however.

**Scoring**

Collecting the events within each series_id, we compute an Average Precision score for each event x tolerance group. The average precision score is the area under the precision-recall curve generated by decreasing confidence score thresholds over the predictions. In this calculation, matched predictions over the threshold are scored as TP and unmatched predictions as FP. Unmatched ground-truths are scored as FN.

**Reduction**

The final score is the average of the above AP scores, first averaged over tolerance, then over event.


### Data 
The dataset comprises about 500 multi-day recordings of wrist-worn accelerometer data annotated with two event types: onset, the beginning of sleep, and wakeup, the end of sleep. 

Though each series is a continuous recording, there may be periods in the series when the accelerometer device was removed. These period are determined as those where suspiciously little variation in the accelerometer signals occur over an extended period of time, which is unrealistic for typical human participants. Events are not annotated for these periods, and you should attempt to refrain from making event predictions during these periods: an event prediction will be scored as false positive (Esper et al., 2023).

#### Files and Field Descriptions

- **train_series.parquet** - Series to be used as training data. Each series is a continuous recording of accelerometer data for a single subject spanning many days.
  - **series_id** - Unique identifier for each accelerometer series.
  - **step** - An integer timestep for each observation within a series.
  - **timestamp** - A corresponding datetime with ISO 8601 format `%Y-%m-%dT%H:%M:%S%z`.
  - **anglez** - As calculated and described by the GGIR package, z-angle is a metric derived from individual accelerometer components that is commonly used in sleep detection, and refers to the angle of the arm relative to the vertical axis of the body.
  - **enmo** - As calculated and described by the GGIR package, ENMO is the Euclidean Norm Minus One of all accelerometer signals, with negative values rounded to zero. While no standard measure of acceleration exists in this space, this is one of the several commonly computed features.

- **train_events.csv** - Sleep logs for series in the training set recording onset and wake events.
  - **series_id** - Unique identifier for each series of accelerometer data in `train_series.parquet`.
  - **night** - An enumeration of potential onset / wakeup event pairs. At most one pair of events can occur for each night.
  - **event** - The type of event, whether onset or wakeup.
  - **step and timestamp** - The recorded time of occurrence of the event in the accelerometer series.
  
### Exploratory data analysis 
* No missing values for enmo and anglez columns
* About one third of the nights do not have "onset" or "wakeup" annotations. This should be due to the person taking the accelerometer off, which can be inferred since the anglez range becomes very small during these times.
* There a few nights for each person where there is only one event annotated. 
* During the sleep durations, The enmo values become much smaller and also less volatile, especially in the beginning of the sleep cycle. Similarly, the anglez values have less rapid fluctuations. 

### Preprocessing 

**Features** 
- **shape**: (n_features, `cfg.duration`), (10, 5760) in the current best model 
- **Sine and cosine components for:**
  - Hour of the day (`hour_sin`, `hour_cos`)
  - Month of the year (`month_sin`, `month_cos`)
  - Minute of the hour (`minute_sin`, `minute_cos`)
  - Angle (`anglez_sin`, `anglez_cos`)
- **Differences between consecutive values for:**
  - Angle (`anglez_diff`)
  - ENMO (`enmo_diff`)
- **Rolling medians for differences with a window size of 5 * 12 for:**
  - Angle (`anglez_diff_rolling_median`)
  - ENMO (`enmo_diff_rolling_median`)
- **Reverse rolling medians for differences with a window size of 5 * 12 for:**
  - Angle (`anglez_diff_rolling_median_reverse`)
  - ENMO (`enmo_diff_rolling_median_reverse`)
  
**Labels**
- **shape**: (`cfg.duration` / `cfg.downsample_rate`, 3), (1920, 3) in the current best model
- 3 values are (is_asleep (0 or 1), onset, wakeup)
- Either onset, wakeup, or background (no label) are present in the label
- If background then all values are 0 for all 1920 steps
- If onset or wakeup, the onset or wakeup are converted to gaussian labels, where the label is still one at the annotated time step, but there are also soft labels around the time step following a normal distribution. 

### Final Model Description: Segmentation model with encoder and decoder
* LSTM feature extractor 
* Unet decoder 

### Post processing
For each event, onset and wakeup, we find peak predictions from our segmentation model make those our only predictions.

### Validation
* 20% validation set split for valid_set 1
* Kaggle public leader board for valid_set 2
* Kaggle private leader board for final test set 

### Notable Experiments

| Model                             | Brief Description                             | Valid_1 Score |
|-----------------------------------|-----------------------------------------------|---------------|
| Baseline| Config defaults                                                         |      0.74     |
| v1_ds3| downsample_rate 2 to 3                                                   |     0.7546    |
| v2_ds3| added rolling median features                                             |     0.7565    |
| v2_ds3_fe_LSTMFeatureExtractor|    Chosen Model                                  |     0.7598    |


### Manual post processing: + .014 validation score
In order to improve the metric, I visually inspected the predictions compared to the ground truth and tried to find simple methods to adjust predictions to improve scores. 
 
**NOTE** These adjustments are done sequentially, so optimized parameters may differ if the order is changed

| Technique        | Brief Description                                                         | valid_1 score |
|------------------|---------------------------------------------------------------------------|---------------|
|  Lower threshold | threshold == .005                                                         | .765
| filter_by_min_max_th | Remove all predictions for a night where the min(max of onset and wakup score) < `th` == .03 | .767|
| filter_(onset of wakeup)_ threshold | Per night, if max is above .82, only keep max prediction           | .768|
|filter_max_score_by_night | Per night, eliminate all predictions if max is not above `th`=.03 | .769|
|inflate_max_wakeup |Per night, find max wakeup score and inflate it by `multiplier` = 3.2     | .771 | 
|inflate_max_onset| Per night, find max wakeup score and inflate it by `multiplier` = 13.4     | .774|

### Test score: .758 



### References

Nathalia Esper, Maggie Demkin, Ryan Hoolbrok, Yuki Kotani, Larissa Hunt, Andrew Leroux, Vincent van Hees, Vadim Zipunnikov, Kathleen Merikangas, Michael Milham, Alexandre Franco, Gregory Kiar. (2023). Child Mind Institute - Detect Sleep States. Kaggle. https://kaggle.com/competitions/child-mind-institute-detect-sleep-states