# Johnny Stuto CS 575: PreProcessing 2

## Neural Network Implementation 

### Problem Definition: Address the issue of sleep/wake prediction based on data from wrist health mon

In [1]:
import numpy as np
import gc
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
import warnings;warnings.simplefilter(action='ignore', category=Warning)

In [2]:
file_path = 'C:/temp/clean_zzz.csv'
zzz = pd.read_csv(file_path)

In [3]:
# creating possible time covariates for model 
start_time = datetime.strptime(zzz['timestamp'].iloc[0], '%Y-%m-%dT%H:%M:%S%z')
zzz['seconds_since_start'] = zzz['timestamp'].apply(lambda x: (datetime.strptime(x, '%Y-%m-%dT%H:%M:%S%z') - start_time).seconds)
zzz = zzz.drop(columns=['timestamp', 'step'])

In [4]:
# sorte by time series_id & seconds_since_start
zzz_timesort = zzz.sort_values(by=['series_id', 'seconds_since_start'])

Z-angle: corresponds to the angle between the accelerometer axis perpendicular to the skin surface and the horizontal plane.

ENMO : The Euclidean Norm Minus One (ENMO) with negative values rounded to zero in g has been shown to correlate with the magnitude of acceleration and human energy expenditure16. ENMO is computed as follows:

$ \text{ENMO} = \sqrt{x^2 + y^2 + z^2} - 1 $


## Data Processing Overview

### Feature Standardization
- The features `anglez`, `enmo` (Euclidean Norm Minus One), and `seconds_since_start` are standardized by subtracting their mean and dividing by their standard deviation.
  - **Standardized `anglez`**: `zzz_timesort['anglez_std']`
  - **Standardized `enmo`**: `zzz_timesort['enmo_std']`
  - **Standardized `seconds_since_start`**: `zzz_timesort['sss_std']`

### Raw Features Creation
- Original (raw) values of `anglez`, `enmo`, and `seconds_since_start` are stored in separate columns:
  - **Raw `anglez`**: `zzz_timesort['anglez_raw']`
  - **Raw `enmo`**: `zzz_timesort['enmo_raw']`
  - **Raw `seconds_since_start`**: `zzz_timesort['sss_raw']`

### Feature Selection
- All standardized features (columns with 'std' in their name) are stored in `X`.
- All raw features (columns with 'raw' in their name) are stored in `X_raw`.
- The target variable `awake` is stored in `y`.

### Data Splitting
- Both standardized and raw datasets are split into training and testing subsets using a 79-21 split.
  - **Standardized Data**:
    - Training Data: `X_train`, `y_train`
    - Testing Data: `X_test`, `y_test`
  - **Raw Data**:
    - Training Data: `X_tr`, `y_train`
    - Testing Data: `X_te`, `y_test`



In [7]:
# Standardize feats: time(seconds since session start),anglez & enmo(Euclidean Norm Minus One)
zzz_timesort['anglez_std'] = (zzz_timesort['anglez'] - zzz_timesort['anglez'].mean()) / zzz_timesort['anglez'].std()
zzz_timesort['enmo_std'] = (zzz_timesort['enmo'] - zzz_timesort['enmo'].mean()) / zzz_timesort['enmo'].std()
zzz_timesort['sss_std'] = (zzz_timesort['seconds_since_start'] - zzz_timesort['seconds_since_start'].mean()) / zzz_timesort['seconds_since_start'].std()

zzz_timesort['anglez_raw'] = (zzz_timesort['anglez'])
zzz_timesort['enmo_raw'] = (zzz_timesort['enmo'])
zzz_timesort['sss_raw'] = (zzz_timesort['seconds_since_start'])


feats = [col for col in zzz_timesort.columns if 'std' in col]
X = zzz_timesort[feats]
y = zzz_timesort['awake']

feats_raw = [col for col in zzz_timesort.columns if 'raw' in col]
X_raw = zzz_timesort[feats_raw]

# split the standardized data (79-21)
train_size = int(0.79 * len(zzz_timesort))
X_train, X_test = X.iloc[:train_size], X.iloc[train_size:]
y_train, y_test = y.iloc[:train_size], y.iloc[train_size:]

# split the raw data (79-21)
X_tr, X_te = X_raw.iloc[:train_size], X_raw.iloc[train_size:]
y_train, y_test = y.iloc[:train_size], y.iloc[train_size:]