# Junwon (Paul) Park

https://www.sktime.net/en/stable/api_reference/auto_generated/sktime.annotation.stray.STRAY.html

https://cookiecutter-data-science.drivendata.org/using-the-template/#open-a-notebook

In [45]:
import pandas as pd

# Importing Data

In [46]:
# running data preprocessing notebook to use its methods
%run "1.02-vr-ggs-data-preprocessing.ipynb"  # Use quotes here

In [47]:
columns_to_load = [
    'date', 'sender_id', 'bgl', 'bgl_date_millis', 'text', 'template', 'msg_type', 
    'affects_fob', 'affects_iob', 'dose_units', 'food_g', 'food_glycemic_index', 
    'dose_automatic', 'fp_bgl', 'message_basal_change', '__typename', 'trend'
]

# Call load_data with specified columns
df_list = load_data(
    data_dir='../data/raw',  # Replace with your actual directory path
    columns_to_load=columns_to_load,
    date_parse_columns=['date']
)


Loaded data from 2 files.


  df = pd.read_csv(os.path.join(data_dir, file), usecols=columns_to_load, parse_dates=date_parse_columns)


In [48]:
df_500030 = df_list[0]
df_679372 = df_list[1]

# Storing Meal times and other message times into df

In [49]:
def filter_meal_related_events(df):
    """
    Filter the DataFrame to include only rows where 'msg_type' matches specific meal-related events.
    
    Parameters:
    - df : DataFrame, the original DataFrame containing a 'msg_type' column.
    
    Returns:
    - filtered_df : DataFrame, filtered to include only meal-related events with relevant dates.
    """
    # List of relevant meal-related events
    relevant_msg_types = ['ANNOUNCE_MEAL', 'INTERVENTION_SNACK', 'ANNOUNCE_EXERCISE', 'DOSE_INSULIN', 'DOSE_BASAL_INSULIN']
    
    # Filter DataFrame to include only rows with relevant 'msg_type' values
    filtered_df = df[df['msg_type'].isin(relevant_msg_types)].copy()
    
    # Ensure 'date' column is in datetime format if not already converted
    filtered_df['date'] = pd.to_datetime(filtered_df['date'], errors='coerce', utc=True)
    
    # Drop any rows with invalid or missing dates
    filtered_df.dropna(subset=['date'], inplace=True)
    
    # Sort by date and reset index
    filtered_df = filtered_df.sort_values('date').reset_index(drop=True)
    
    return filtered_df


In [50]:
# this data frame will help us validate meal times, we can validate our model using this data by checking if selected date is an actual meal time?
df_actual = filter_meal_related_events(df_679372) 

# Cleaned and pre processed data

In [51]:
df_679372_processed = preprocess_data(df_679372)

# Model Traning

## Step1: Identify Relevant Features for STRAY MODEL

### Relevant Columns:

`bgl`: The main variable representing blood glucose levels, which could show spikes or patterns around meals.

`dose_units`: Insulin doses often correlate with meals (especially bolus doses) to manage postprandial glucose levels.

`food_g`: The amount of carbohydrates, where non-zero values likely indicate meals or snacks.

`food_glycemic_index`: Helps distinguish between regular meals and higher-glycemic foods (like snacks). 0.5 for regular meal announcements, or 1 for INTERVENTION_SNACKS

`affects_fob` or `affects_iob`: fob -> food on board, iob -> insuline on board. 

`trend`: Indicates the direction of blood glucose change rate, which could help capture sharp rises or falls associated with meals.

### Improvement:

Potentially do a better feature selection, currently I wanted to testout how the STRAY model works with multiple explanatory variables thus I have selected to work with the columns specified above.

In [61]:
def prepare_data_for_stray(df):
    """
    Prepare the DataFrame by selecting relevant columns for STRAY anomaly detection model.
    
    Parameters:
    - df : DataFrame, the original DataFrame.
    
    Returns:
    - df_for_stray : DataFrame, with selected columns and necessary preprocessing applied.
    """
    # Select columns that are relevant for detecting meal-related anomalies
    df_for_stray = df[['bgl', 'dose_units', 'food_g', 'food_glycemic_index', 
                       'affects_fob', 'affects_iob', 'trend']]
    
    # Convert any categorical variables to numeric if needed
    # For example, encoding 'trend' as numbers (e.g., FLAT=0, SINGLE_UP=1, etc.)
    trend_mapping = {'FLAT': 0, 'SINGLE_UP': 1, 'DOUBLE_UP': 2, 'FORTYFIVE_UP': 3, 
                     'FORTYFIVE_DOWN': -1, 'DOUBLE_DOWN': -2, 'NOT_COMPUTABLE': None}
    df_for_stray['trend'] = df_for_stray['trend'].map(trend_mapping)
    
    # Drop rows with nulls if necessary
    df_for_stray.dropna(inplace=True)
    
    return df_for_stray

In [62]:
# Apply to your DataFrame
df_for_stray = prepare_data_for_stray(df_679372_processed)  # Replace 'original_df' with your DataFrame variable

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_for_stray['trend'] = df_for_stray['trend'].map(trend_mapping)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_for_stray.dropna(inplace=True)


## Step2: Train an Apply STRAY for Meal Detection

STRAY is typically used to detect outliers, which in this case can signify unusual blood glucose changes related to meals.


In [64]:
# setting up STRAY
from sktime.annotation.stray import STRAY

In [67]:
# Initialize STRAY for anomaly detection
stray_model = STRAY()  # k is a parameter defining the number of neighbors, adjust based on data

# Detect anomalies using fit_transform (not fit_predict)
anomalies = stray_model.fit_transform(df_for_stray)

In [73]:
anomalies[anomalies[0]==True]

Unnamed: 0_level_0,0
date,Unnamed: 1_level_1
2024-07-01 18:53:25+00:00,True
2024-07-01 18:57:27+00:00,True
2024-07-01 19:00:27+00:00,True
2024-07-01 19:02:27+00:00,True
2024-07-01 19:04:28+00:00,True
...,...
2024-10-01 03:27:07+00:00,True
2024-10-01 03:30:09+00:00,True
2024-10-01 03:34:11+00:00,True
2024-10-01 03:47:20+00:00,True
