# AAI-530 - Final Team Project - Group 6

## Overview
This notebook documents the development of a machine learning IoT application for predicting sleep quality using a Fitbit dataset. The project encompasses data loading, exploratory data analysis, feature engineering, model building (including a deep learning LSTM model and a traditional machine learning classifier), and hyperparameter tuning.

## Objectives
* To predict the `overall_score` of sleep quality using LSTM, satisfying both the deep learning and time series requirements of the project.
* To classify sleep quality features as bad, average, or good for daily notifications on a smartwatch using a traditional ML classifier.

## PROJECT CONSTANTS

In [None]:
# ---------------------------------------------------------------------------- #
# GOOGLE DRIVE
# ---------------------------------------------------------------------------- #
GOOGLE_DRIVE_FOLDER_PATH  = "530 Final/IoT AAI-530 Final Project"

# ---------------------------------------------------------------------------- #
# GITHUB
# ---------------------------------------------------------------------------- #
REPO_DIR = "project"
REPO_URL = "https://github.com/aai530-group6/project.git"

# ---------------------------------------------------------------------------- #
# DATASET
# ---------------------------------------------------------------------------- #
DATASET_FILENAME = "sleep_score_data_fitbit.csv"
FITBIT_SLEEP_SCORE_DATASET_URL = f"https://huggingface.co/datasets/aai530-group6/sleep-score-fitbit/resolve/main/{DATASET_FILENAME}?download=true"

## INSTALLS

In [None]:
%%bash
apt-get install wkhtmltopdf mono-complete

pip install --quiet --progress-bar off \
    black[jupyter] \
    dataprep \
    huggingface-hub \
    isort \
    pdfkit \
    pythonnet \
    wkhtmltopdf

Reading package lists...
Building dependency tree...
Reading state information...
mono-complete is already the newest version (6.8.0.105+dfsg-3.2).
wkhtmltopdf is already the newest version (0.12.6-2).
0 upgraded, 0 newly installed, 0 to remove and 34 not upgraded.


ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
bigframes 0.21.0 requires sqlalchemy<3.0dev,>=1.4, but you have sqlalchemy 1.3.24 which is incompatible.
ipython-sql 0.5.0 requires sqlalchemy>=2.0, but you have sqlalchemy 1.3.24 which is incompatible.
panel 1.3.8 requires bokeh<3.4.0,>=3.2.0, but you have bokeh 2.4.3 which is incompatible.


## IMPORTS

In [None]:
import contextlib
import hashlib
import os
import pathlib
import zipfile

import clr
import google.colab
import isort
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pdfkit
import requests
import seaborn as sns
import System
from dataprep.datasets import load_dataset
from dataprep.eda import *
from dataprep.eda import (create_report, plot, plot_correlation, plot_diff,
                          plot_missing)
from EvoPdf import HtmlToPdfConverter
from keras.preprocessing.sequence import pad_sequences
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.callbacks import EarlyStopping, LambdaCallback
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2



ModuleNotFoundError: No module named 'EvoPdf'

In [None]:
# IMPORT SORTER
import_string = """
import contextlib
import hashlib
import os
import pathlib
import zipfile

import clr
import google.colab
import isort
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pdfkit
import requests
import seaborn as sns
import System
from dataprep.datasets import load_dataset
from dataprep.eda import *
from dataprep.eda import (create_report, plot, plot_correlation, plot_diff,
                          plot_missing)
from EvoPdf import HtmlToPdfConverter
from keras.preprocessing.sequence import pad_sequences
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.callbacks import EarlyStopping, LambdaCallback
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.regularizers import l2
"""
print(isort.code(import_string))

## HELPER FUNCTIONS

In [None]:
def unzip(zip_file_path: str, extraction_directory: str) -> None:
    """
    Unzips a zip file into a specified directory, checking if the zip file
    exists before extraction.

    :param zip_file_path: Path to the zip file.
    :type zip_file_path: str
    :param extraction_directory: Target directory for extraction.
    :type extraction_directory: str
    :return: None
    """
    if os.path.exists(zip_file_path):
        with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
            zip_ref.extractall(extraction_directory)
        print("Extraction completed successfully.")
    else:
        print("The zip file does not exist.")


def compute_hash(file_path: str) -> str:
    """
    Computes the SHA-256 hash for a given file, reading in binary mode and
    processing in blocks for efficiency with large files.

    :param file_path: Path to the file for hash computation.
    :type file_path: str
    :return: Hexadecimal string of the file's SHA-256 hash.
    :rtype: str
    """
    sha256_hash = hashlib.sha256()
    with open(file_path, 'rb') as f:
        for byte_block in iter(lambda: f.read(4096), b""):
            sha256_hash.update(byte_block)
    return sha256_hash.hexdigest()


def should_download(url: str, save_path: str) -> bool:
    """
    Determines if a file should be downloaded by checking its existence and
    comparing the content hash with the online version.

    :param url: URL of the file to potentially download.
    :type url: str
    :param save_path: Local path where the file would be saved.
    :type save_path: str
    :return: True if download is needed, False otherwise.
    :rtype: bool
    """
    try:
        response = requests.get(url, stream=True)
        # DON'T DOWNLOAD IF INACCESSIBLE
        if response.status_code != 200:
            return False
        # DON'T DOWNLOAD IF SAME FILE
        downloaded_hash = hashlib.sha256(response.content).hexdigest()
        if os.path.exists(save_path):
            existing_hash = compute_hash(save_path)
            if existing_hash == downloaded_hash:
                return False

    except Exception as e:
        print(f"An error occurred: {e}")
        return False
    return True


def download(url: str, save_path: str) -> str:
    """
    Downloads a file from a URL to a local path if it's absent or outdated.
    Raises an exception for non-200 HTTP status during download.

    :param url: URL of the file to download.
    :type url: str
    :param save_path: Local path to save the downloaded file.
    :type save_path: str
    :raises Exception: For non-200 HTTP status during download.
    :return: None. Directly writes to disk if download occurs.
    :rtype: None
    """
    if should_download(url, save_path):
        response = requests.get(url)
        if response.status_code != 200:
            raise Exception(f"DOWNLOAD FAILED. STATUS: {response.status_code}")
        with open(save_path, 'wb') as f:
            f.write(response.content)
    print(f"DOWNLOADED FILE: {save_path}.")


def create_google_drive_shortcut(target_folder_name: str = "temp") -> str:
    """
    Creates a shortcut to a Google Drive folder, ensuring easier access by
    mounting Google Drive and creating a symlink in the Colab environment.

    :param target_folder_name: Name of the Google Drive folder for the shortcut.
    :type target_folder_name: str
    :return: Path to the symlink created in the Colab environment.
    :rtype: str
    """
    # MOUNT GOOGLE DRIVE
    with contextlib.redirect_stdout(open(os.devnull, 'w')):
        google.colab.drive.mount("/content/drive", force_remount=True)

    # DEFINE BASE PATHS FOR DRIVE AND SHORTCUTS
    base_drive_path = pathlib.Path("/content/drive/My Drive")
    project_path = base_drive_path / target_folder_name
    shortcut_folder_name = target_folder_name.split('/')[-1]
    shortcut_path = pathlib.Path(f"/content/{shortcut_folder_name}")

    # ENSURE PROJECT FOLDER AND PARENT FOLDERS EXIST IN GOOGLE DRIVE
    project_path.mkdir(parents=True, exist_ok=True)

    # CREATE FOLDER SHORTCUT IF NON-EXISTENT
    if not shortcut_path.exists() or not shortcut_path.is_symlink():
        if shortcut_path.exists():
            shortcut_path.unlink()
        shortcut_path.symlink_to(project_path, target_is_directory=True)
    print(f"FOLDER SHORTCUT: {shortcut_path} --> {project_path}")
    return str(shortcut_path)


def git_clone(base_dir: str, repo_dir: str, url: str, gh_token: str) -> None:
    """
    Clones a Git repository into a specified directory. If the directory does
    not exist, it is created. Uses a GitHub read-only token for authentication.
    Prints a message indicating the repository's clone status.

    :param base_dir: The base directory for cloning the repository.
    :type base_dir: str
    :param repo_dir: The directory name for the cloned repository.
    :type repo_dir: str
    :param url: The HTTP URL of the Git repository to clone.
    :type url: str
    :param gh_token: The GitHub read-only access token for authentication.
    :type gh_token: str
    :return: None. Indicate clone repository status at the specified directory.
    """
    clone_path = os.path.join(base_dir, repo_dir)
    if not os.path.exists(clone_path):
        !git clone {auth_repo_url} {clone_path}
    print(f"CLONED: {clone_path}")


## SETUP

In [None]:
# REMOVE SAMPLE DATA FOLDER
!rm -rf /content/sample_data

# CREATE GOOGLE DRIVE FOLDER SHORTCUT FOR SPEED AND PERSISTENCE
SHORTCUT  = create_google_drive_shortcut(GOOGLE_DRIVE_FOLDER_PATH)

# CLONE GITHUB REPOSITORY
CLONE_PATH = os.path.join(SHORTCUT, REPO_DIR)
if not os.path.exists(CLONE_PATH):
    !cd "{SHORTCUT}" && git clone "{REPO_URL}"

# DOWNLOAD DATASET TO GOOGLE DRIVE
DATASET_FILEPATH = os.path.join(SHORTCUT, DATASET_FILENAME)
download(FITBIT_SLEEP_SCORE_DATASET_URL, DATASET_FILEPATH)

# UNZIP EVO
unzip("/content/IoT AAI-530 Final Project/project/files/EvoHtmlToPdf-v10.0.zip", "/content/EvoHtmlToPdf")

!cp "/content/EvoHtmlToPdf/evointernal.dat" ""

## LOAD

In [None]:
# Load the dataset from a CSV file into a pandas DataFrame
dataset_df = pd.read_csv(DATASET_FILEPATH)

# Initialize an empty dictionary to store different versions of DataFrames
dfs = {}

# Copy the original DataFrame and store in the dictionary with key 'sleep_score'
# This keeps the original dataset unchanged while working on its copy
dfs['sleep_score'] = dataset_df.copy()

# Retrieve and assign the copied DataFrame from the dictionary to 'df'
# This allows for easier manipulation of this specific dataset version
df = dfs['sleep_score']

#Exploratory Data Analysis

In [None]:
plot_missing(df)

In [None]:
plot_correlation(df, "overall_score")

In [None]:
EDA_REPORT_FILENAME = "exploratory_data_analysis"
report = create_report(df, title="Exploratory Data Analysis")

In [None]:
# report.save(f"{SHORTCUT}/{EDA_REPORT_FILENAME}")
# html =f"/{SHORTCUT}/{EDA_REPORT_FILENAME}.html"

# import clr
# import System
# # ADD REFERENCE TO THE EVOPDF DLL
# clr.AddReference("/content/EvoHtmlToPdf/evohtmltopdf.dll")
# from EvoPdf import HtmlToPdfConverter

# # DEFINE THE PATH TO YOUR HTML FILE
# html_file_path = f"/{SHORTCUT}/{EDA_REPORT_FILENAME}.html"

# # READ THE HTML CONTENT FROM THE FILE
# with open(html_file_path, 'r', encoding='utf-8') as file:
#     html_content = file.read()

# # INITIALIZE THE HTML TO PDF CONVERTER
# converter = HtmlToPdfConverter()
# converter.PdfToolFullPath = "/content/EvoHtmlToPdf"
# converter.ConversionDelay = 0

# pdf_bytes = converter.ConvertHtml(html_content, "")

# output_file_path = f"{SHORTCUT}/{EDA_REPORT_FILENAME}.pdf"
# System.IO.File.WriteAllBytes(output_file_path, pdf_bytes)


In [None]:
report

In [None]:
plot_missing(df)

In [None]:
plot_correlation(df)

In [None]:
# Datetime
df['timestamp'] = pd.to_datetime(df['timestamp'])
df.sort_values('timestamp', inplace=True)


# Calculate day of the week
df['day_of_week'] = df['timestamp'].dt.day_name()

# Labelencode day_of_week
label_encoder = LabelEncoder()
df['day_of_week_encoded'] = label_encoder.fit_transform(df['day_of_week'])

In [None]:
# Missing values
missing_values = df.isnull().sum()
print("Missing values:\n", missing_values)

In [None]:
df.describe()

In [None]:
# Distribution of Overall Sleep Score
plt.figure(figsize=(10, 6))
sns.histplot(df['overall_score'], kde=True, bins=20)
plt.title('Overall Sleep Score Distribution')
plt.xlabel('Overall Score')
plt.ylabel('Frequency')
plt.show()

In [None]:
## Correlation Heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt=".2f")
plt.title('Correlation Heatmap')
plt.show()

Notes

- Target variable: 'overall_score'
- 'deep_sleep_in_minutes' has a high correlation of 0.70
- 'restlessness' has a notable negative correlation of -0.40

In [None]:
# Distribution of Deep Sleep
plt.figure(figsize=(10, 6))
sns.histplot(df['deep_sleep_in_minutes'], kde=True, bins=20)
plt.title('Deep Sleep (in minutes) Distribution')
plt.xlabel('Minutes')
plt.ylabel('Frequency')
plt.show()

In [None]:
# Distribution of Restlessness
plt.figure(figsize=(10, 6))
sns.histplot(df['restlessness'], kde=True, bins=20)
plt.title('Restlessness Distribution')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()

Feature Engineering

- Add lagged features
- Add noise

In [None]:
# Lag features for 'deep_sleep_in_minutes', 'overall_score', and 'restlessness' (target + 2 important features)
for lag in range(1, 8):
    df[f'deep_sleep_in_minutes_lag{lag}'] = df['deep_sleep_in_minutes'].shift(lag)
    df[f'overall_score_lag{lag}'] = df['overall_score'].shift(lag)
    df[f'restlessness_lag{lag}'] = df['restlessness'].shift(lag)

# Drop rows with NaN values created by lagging
df.dropna(inplace=True)

In [None]:
noise_strength = 0.02
features_to_noise = ['deep_sleep_in_minutes', 'resting_heart_rate', 'restlessness'] + \
                    [f'deep_sleep_in_minutes_lag{lag}' for lag in range(1, 8)] + \
                    [f'restlessness_lag{lag}' for lag in range(1, 8)]
for feature in features_to_noise:
    df[feature] += np.random.normal(0, noise_strength, df.shape[0])

# LSTM - Deep Learning Neural Network

In [None]:
# Define the sequence length and predictive horizon
sequence_length = 7
predictive_horizon = 1

# Set target, features, and exclude 'revitalization_score' due to 1.0 correlation with target
features_columns = ['deep_sleep_in_minutes', 'resting_heart_rate', 'restlessness'] + \
                   [f'deep_sleep_in_minutes_lag{lag}' for lag in range(1, 8)] + \
                   [f'overall_score_lag{lag}' for lag in range(1, 8)] + \
                   [f'restlessness_lag{lag}' for lag in range(1, 8)]

features = df[features_columns]
target = df['overall_score']

# Normalize the features
scaler = MinMaxScaler(feature_range=(0, 1))
scaled_features = scaler.fit_transform(features)

X = []
y = []

# Generate sequences for feedforward neural network
for i in range(len(scaled_features) - sequence_length - predictive_horizon + 1):
    X.append(scaled_features[i:i + sequence_length].flatten())
    y.append(target.iloc[i + sequence_length + predictive_horizon - 1])

X = np.array(X)
y = np.array(y)

# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, shuffle=False)

# Define the feedforward neural network model
model = Sequential()

# Dense layers
model.add(Dense(64, input_dim=X_train.shape[1], activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(32, activation='relu'))
model.add(Dense(1, activation='linear'))

# Compile the model
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mse', 'mae'])

# Early stopping to prevent overfitting
early_stopping = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)

# Train the model
history = model.fit(
    X_train, y_train,
    epochs=150, batch_size=32, validation_data=(X_val, y_val),
    callbacks=[early_stopping]
)

# Evaluate the model on the validation set
val_loss, val_mse, val_mae = model.evaluate(X_val, y_val)

print(f"Validation Loss: {val_loss}")
print(f"Validation Mean Squared Error: {val_mse}")
print(f"Validation Mean Absolute Error: {val_mae}")

# Visualizing Validation and MAE

In [None]:
# Plot training history
plt.figure(figsize=(10, 6))

# Plot training & validation loss values
plt.subplot(2, 1, 1)
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()

# Plot training & validation mean absolute error values
plt.subplot(2, 1, 2)
plt.plot(history.history['mae'], label='Training MAE')
plt.plot(history.history['val_mae'], label='Validation MAE')
plt.title('Mean Absolute Error')
plt.xlabel('Epoch')
plt.ylabel('MAE')
plt.legend()

plt.tight_layout()
plt.show()

# Hyperparameter Tuning

In [None]:
# Define the function to create the model
def create_model(units_layer1=64, dropout_rate=0.5, units_layer2=32, optimizer='adam'):
    model = Sequential()
    model.add(Dense(units_layer1, input_dim=X_train.shape[1], activation='relu'))
    model.add(Dropout(dropout_rate))
    model.add(Dense(units_layer2, activation='relu'))
    model.add(Dense(1, activation='linear'))
    model.compile(loss='mean_squared_error', optimizer=optimizer, metrics=['mse', 'mae'])
    return model

# Define the hyperparameter grid
param_grid = {
    'units_layer1': [32, 64, 128],
    'dropout_rate': [0.3, 0.5, 0.7],
    'units_layer2': [16, 32, 64],
    'optimizer': ['adam', 'rmsprop']
}

# Perform GridSearchCV
best_model = None
best_val_mse = float('inf')

for units_layer1 in param_grid['units_layer1']:
    for dropout_rate in param_grid['dropout_rate']:
        for units_layer2 in param_grid['units_layer2']:
            for optimizer in param_grid['optimizer']:
                # Create the model
                model = create_model(units_layer1, dropout_rate, units_layer2, optimizer)

                # Train the model
                model.fit(X_train, y_train, epochs=100, batch_size=32, verbose=0)

                # Evaluate on validation set
                y_pred = model.predict(X_val)
                val_mse = mean_squared_error(y_val, y_pred)

                # Check if it's the best model
                if val_mse < best_val_mse:
                    best_val_mse = val_mse
                    best_model = model

# Print the best hyperparameters
print("Best Hyperparameters: ", best_model.get_config())

# Evaluate the best model on the validation set
y_pred = best_model.predict(X_val)
val_mse = mean_squared_error(y_val, y_pred)
val_mae = mean_absolute_error(y_val, y_pred)

print(f"Validation Mean Squared Error: {val_mse}")
print(f"Validation Mean Absolute Error: {val_mae}")

# Feature Importantance

In [None]:
target = df['overall_score']
features_columns = ['deep_sleep_in_minutes', 'resting_heart_rate', 'restlessness'] + \
                   [f'deep_sleep_in_minutes_lag{lag}' for lag in range(1, 8)] + \
                   [f'overall_score_lag{lag}' for lag in range(1, 8)]

features = df[features_columns]

# Feature importances
rf = RandomForestRegressor(n_estimators=100, random_state=42)
rf.fit(features, target)

importances = rf.feature_importances_
importances_series = pd.Series(importances, index=features_columns)
sorted_importances = importances_series.sort_values(ascending=False)

# Plot
plt.figure(figsize=(10, 6))
sns.barplot(x=sorted_importances, y=sorted_importances.index)
plt.xlabel('Importance')
plt.ylabel('Features')
plt.title('Random Forest - Feature Importance')
plt.show()

In [None]:
# Bin deep sleep based on z-scores
mean = df['deep_sleep_in_minutes'].mean()
std_dev = df['deep_sleep_in_minutes'].std()
df['z_score'] = (df['deep_sleep_in_minutes'] - mean) / std_dev
df['deep_sleep_category'] = df['z_score'].apply(lambda z: 'Poor' if z < -1 else ('Average' if z < 1 else 'Good'))

# Bin restlessness based on z-scores
mean_restlessness = df['restlessness'].mean()
std_dev_restlessness = df['restlessness'].std()
df['z_score_restlessness'] = (df['restlessness'] - mean_restlessness) / std_dev_restlessness
df['restlessness_category'] = df['z_score_restlessness'].apply(lambda z: 'Low' if z < -1 else ('Average' if z < 1 else 'High'))

In [None]:
# Distribution of Deep Sleep - Categorical Variable
plt.figure(figsize=(8, 4))
sns.histplot(df['deep_sleep_category'], kde=False, bins=20)
plt.title('Deep Sleep - Categorical Distribution')
plt.xlabel('Score')
plt.ylabel('Frequency')
plt.show()

In [None]:
# Plot the distribution of the restlessness categories
plt.figure(figsize=(8, 4))
sns.histplot(df['restlessness_category'], kde=False, bins=20)
plt.title('Restlessness - Categorical Distribution')
plt.xlabel('Category')
plt.ylabel('Frequency')
plt.show()

# Traditional ML Classifier

In [None]:
# Encode the new categorical variables
label_encoder_deep_sleep = LabelEncoder()
df['deep_sleep_category_encoded'] = label_encoder_deep_sleep.fit_transform(df['deep_sleep_category'])

label_encoder_restlessness = LabelEncoder()
df['restlessness_category_encoded'] = label_encoder_restlessness.fit_transform(df['restlessness_category'])

In [None]:
# Deep_sleep_category classification
deep_sleep_features_columns = ['resting_heart_rate', 'restlessness'] + \
                              [f'overall_score_lag{lag}' for lag in range(1, 8)]

deep_sleep_features = df[deep_sleep_features_columns]
deep_sleep_target = df['deep_sleep_category_encoded']

# Split
X_train_deep_sleep, X_val_deep_sleep, y_train_deep_sleep, y_val_deep_sleep = train_test_split(
    deep_sleep_features,
    deep_sleep_target,
    test_size=0.2,
    shuffle=False,
    random_state=0
)

deep_sleep_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
deep_sleep_classifier.fit(X_train_deep_sleep, y_train_deep_sleep)

y_val_pred_deep_sleep = deep_sleep_classifier.predict(X_val_deep_sleep)

# Evaluate the classifier
print("Classification report for deep_sleep_category:")
print(classification_report(y_val_deep_sleep, y_val_pred_deep_sleep))
print("Confusion matrix for deep_sleep_category:")
print(confusion_matrix(y_val_deep_sleep, y_val_pred_deep_sleep))

In [None]:
# Restlessness_category classification
restlessness_features_columns = ['resting_heart_rate', 'deep_sleep_in_minutes'] + \
                                [f'overall_score_lag{lag}' for lag in range(1, 8)]

restlessness_features = df[restlessness_features_columns]
restlessness_target = df['restlessness_category_encoded']

# Split
X_train_restlessness, X_val_restlessness, y_train_restlessness, y_val_restlessness = train_test_split(
    restlessness_features,
    restlessness_target,
    test_size=0.2,
    shuffle=False,
    random_state=0
)

restlessness_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
restlessness_classifier.fit(X_train_restlessness, y_train_restlessness)

y_val_pred_restlessness = restlessness_classifier.predict(X_val_restlessness)

# Evaluate the classifier
print("Classification report for restlessness_category:")
print(classification_report(y_val_restlessness, y_val_pred_restlessness))
print("Confusion matrix for restlessness_category:")
print(confusion_matrix(y_val_restlessness, y_val_pred_restlessness))

# TODO:

Idea: try to predict how you're going to sleep the next day based on prior data.

Smart Watch Sleep Tracker - you're predicted to sleep poorly.
Time Series Predictor

Level of Deep Sleep was low (last night)
Be on the look outout

## BRANDON

* [ ] Refining the model and the CV.

## ALEC

* [ ] Do RF technique
* [ ] First part of documentation
* [ ] Classification work

## DIAGRAMS

1. [ ] DIAGRAM 1: ML1. LOSS GRAPHS
2. [ ] DIAGRAM 2: ML2. MAE
3. [ ] DIAGRAM 3: SUMMARY GRAPH - whatever major feature
4. [ ] DIAGRAM 4: STATUS GRAPH - current status of IOT device (perhaps next day)


[ ] Create system diagram




Similar to Assignment 5, your submitted Tableau Public Dashboard must include:
- At least one status visualization - This visualization should tell the dashboard user something about the "current" status of their IoT device (e.g. number of devices online, current glucose level, etc.)

- At least one summary visualization - This visualization should tell the dashboard user
something about the historical data from their IoT device (e.g. average device downtime
over the last week, number of hypo/hyperglycemic readings over the last week)
- At least one visualization for each of your two machine learning insights -
This visualization should communicate the insights created by your machine
learning methods.
