# License Notice

Copyright (c) 2024 Warren Bebbington

This notebook is part of the simple-glucose-analysis project and is licensed under the MIT License. For the full license text, please see the LICENSE file in the project's root directory.

In [None]:
from sqlalchemy import create_engine, inspect
import pandas as pd

# How to Backup SQLite Database from XDrip+ Android App

To manually back up the SQLite database in the XDrip+ app and save it for use in your `simple_glucose_analysis` project, follow these steps:

## Steps to Backup the Database

1. **Open XDrip+ App**:
   - Launch the XDrip+ app on your Android device.

2. **Access the Menu**:
   - Tap the **hamburger menu** (three horizontal lines) located at the top right of the screen.

3. **Select Import/Export**:
   - From the dropdown menu, select **Import/Export**.

4. **Export Database**:
   - Choose the **Export Database** option.
   - Follow any prompts to confirm the backup location if necessary.

5. **Save the Database File**:
   - When prompted to select a save location, choose a folder that is easily accessible.
   - **Important**: Save the database file (typically named `export.sqlite`) in the main directory of your `simple_glucose_analysis` project.

6. **Verify Backup**:
   - Ensure the database file is saved correctly in your project directory. You can check this using a file explorer on your device or your computer.

## Using the Database in Your Project

Once the database file is saved in the `simple_glucose_analysis` project directory, you can load it into the preprocessing notebook.

**Note**: It's good practice to back up your database regularly to prevent data loss!


### Load your Xdrip+ Sqlite backup

In [None]:
# Path to your SQLite file
db_path = 'path-to-your-file.sqlite'

# Create an SQLAlchemy engine
engine = create_engine(f'sqlite:///{db_path}')

# Use SQLAlchemy's inspector to list all tables
inspector = inspect(engine)
tables = inspector.get_table_names()
print(tables)

In [None]:
# Load BgReadings table into a pandas DataFrame
glucose_data = 'BgReadings'  # Table containing all BG Readings from XDrip+
bg_df = pd.read_sql_table(glucose_data, con=engine)
bg_df['timestamp'] = pd.to_datetime(bg_df['timestamp'], unit='ms')

# Load Treatments table into a pandas DataFrame
treatments_data = 'Treatments'  # Table containing all Treatments from XDrip+
treatments_df = pd.read_sql_table(treatments_data, con=engine)
treatments_df['timestamp'] = pd.to_datetime(treatments_df['timestamp'], unit='ms')

# Explore the first few rows of the blood glucose table
bg_df.head()

In [None]:
treatments_df.head()

We can see that the insulin column in XDrip+ is used for storing both basal and bolus insulin doses and these can be differentiated by the insulinJSON column which will show the type of insulin you set in XDrip+. In this case Novorapid(bolus) and Levemir(basal). We will create a function that loops the database and for each row in `insulin` that has any value above 0.0, we will check the insulinJSON for the word 'Novorapid' if this word is present we will move the vale to a column named `bolus` and if not we will set the value in a column named `basal`. We will then drop the rest of the rows in the treatments table.

**UPDATE** - It seems the word Novorapid is not always present in the insulinJSON column and for this reason we will use the word 'Levemir' instead to try and isolate basal doses, this may be different depending on how you setup XDrip+.

**UPDATE** - Neither value is consistent enough to distinguish the insulin type, for this reason i will use a cut off value of 10 units to decide if the insulin is basal or bolus. I have chosen 10 because my basal dose has always been above this and my maximum bolus dose is 6 units. This should adequatley determine which is which for my data. You may need to adjust these values. 

### Save Raw Data

We will save the data in csv files for your own use. The BgReadings tables contains more data to be looked into, and there seem to be other useful tables including HeartRate(recorded by XDrip+ if health data is available on android device, eg. SmartWatch), Calibrations(calibration data), BloodReadings(Finger Prick results) and more...

In [None]:
bg_df.to_csv('data/raw_bg.csv')
treatments_df.to_csv('data/raw_treaments.csv')

In [None]:
bg_df.info()

In [None]:
# Create two new columns 'bolus' and 'basal', initializing with NaN values
treatments_df['bolus'] = float('nan')
treatments_df['basal'] = float('nan')

# Filter rows where insulin > 0
insulin_positive = treatments_df['insulin'] > 0

# Filter rows where insulin >= 10
above_10 = treatments_df['insulin'] >= 10

# For rows where 'insulin' > 0 and 'insulin' is >= 10, assign to 'basal'
treatments_df.loc[insulin_positive & above_10, 'basal'] = treatments_df['insulin']

# For rows where 'insulin' > 0 and 'insulin' is < 10, assign to 'bolus'
treatments_df.loc[insulin_positive & ~above_10, 'bolus'] = treatments_df['insulin']

# Display the updated DataFrame to check the result
print(treatments_df[['insulin', 'bolus', 'basal']])

In [None]:
treatments_df.info()

### Unrequired data

We will now drop all unrequired columns.

In [None]:
# Create dataframes with only our required columns and rename calculated_value to glucose
bg_df = bg_df[['calculated_value', 'timestamp']].copy()
bg_df.set_index('timestamp', inplace=True)
bg_df.rename(columns={'calculated_value': 'glucose'}, inplace=True)

treatments_df = treatments_df[['carbs', 'basal', 'bolus', 'timestamp']].copy()
treatments_df.set_index('timestamp', inplace=True)

In [None]:
bg_df

In [None]:
treatments_df

### Resample data

We will resample both tables to 5 minute intervals and sum any 5 minute periods with multiple treatments to the next 5 minutes, this will enable proper alignment of both tables whilst still maintaing the temporal relationships of treatments and blood glucose readings.

In [None]:
# Resample bg_data to 5-minute intervals
bg_df = bg_df.resample('5min').mean()

# Resample treatments_df to 5-minute intervals, aggregating data
# You can choose different aggregation methods, e.g., sum, mean, first, etc.
treatments_df = treatments_df.resample('5min').sum()  # The sum of all values for the same 5-minute intervals

# Create a date range covering the entire period
full_date_range = pd.date_range(start=min(bg_df.index.min(), treatments_df.index.min()),
                                end=max(bg_df.index.max(), treatments_df.index.max()),
                                freq='5min')

# Reindex both dataframes to this full range (this will add missing timestamps with NaNs)
bg_df = bg_df.reindex(full_date_range)  # Fill missing values in bg_df
treatments_df = treatments_df.reindex(full_date_range).fillna(0)  # Fill missing values in treatments_df

### Handle missing glucose level data

We will inspect the glucose readings data for any gaps in the glucose values.

In [None]:
# Identify gaps in glucose readings
bg_df['is_gap'] = bg_df['glucose'].isna()
bg_df['gap_group'] = (bg_df['is_gap'] != bg_df['is_gap'].shift()).cumsum()
gaps = bg_df[bg_df['is_gap']].groupby('gap_group')
gaps_greater_than_60min = gaps.filter(lambda x: len(x) >= 12)

number_of_gaps = len(gaps_greater_than_60min['gap_group'].unique())
print(f"Number of gaps greater than 60 minutes: {number_of_gaps}")

In [None]:
# Save to csv if you wish to inspect for further insight into missing glucose readings in your data
gaps_greater_than_60min.to_csv('data/biggaps.csv')

In [None]:
if number_of_gaps > 0:
    print("Gaps greater than 60 minutes:")
    print(gaps_greater_than_60min.groupby('gap_group').first())

## Combine

We will now combine the dataframes and drop all rows with more than 60 mins missing glucose readings data and interpolate all gaps smaller than this and finally add day and time columns(This is to help anonamise my data and can be skipped if you wish to use your own data and see the actual date and time). Some of the analysis script will need modifying in order to display your actual date ranges. We will then export the data to be used in the analysis.

In [None]:
# Combine the dataframes
combined_df = pd.concat([bg_df, treatments_df], axis=1)
combined_df

In [None]:
combined_df.info()

In [None]:
# Create a 'day' column by extracting the date part of the DatetimeIndex
combined_df['day_of_week'] = combined_df.index.day_name()

# Create a 'time' column by extracting the time part of the DatetimeIndex
combined_df['time'] = combined_df.index.time

# Use this line if you wish to maintain the timestamp in the data and use it in the analysis
# combined_df['actual_timestamp'] = combined_df['timestamp']

In [None]:
# Step 1: Count the number of rows in each 'gap_group'
group_sizes = combined_df.groupby('gap_group').size()

# Step 2: Identify the gap groups that are smaller than 12 rows
small_gap_groups = group_sizes[group_sizes < 12].index

# Step 3: Filter the DataFrame to include:
# - Rows where 'gap_group' is in small_gap_groups
# - OR rows where 'glucose' is not NaN
filtered_df = combined_df[
    (combined_df['gap_group'].isin(small_gap_groups)) | 
    (combined_df['glucose'].notna())
]

# Step 4: Interpolate the remaining gaps in glucose column and drop gaps columns
filtered_df = filtered_df.copy() # Create copy of dataframe to avoid setting value in df slice warning
filtered_df['glucose'] = filtered_df['glucose'].interpolate(method='linear')
filtered_df = filtered_df.drop(columns=['is_gap', 'gap_group'])

# Step 5: Reset the index and inspect the result
filtered_df.reset_index(drop=True, inplace=True)
print(filtered_df)

In [None]:
filtered_df.info()

## Personalisations

Feel free to use the below lines to modify the data to use with your own metrics.

In [None]:
# Uncomment the below lines to adjust your data

# Convert glucose from mg/dL to mmol/L using standard /18
# filtered_df['glucose'] = filtered_df['glucose'] / 18.0

In [None]:
filtered_df

## Export your data

If you are running the analysis on your own data you can export to a csv file now and begin the analysis. Be aware this data will span however long your backup from XDrip+ covers not just 90 days like the sample data.


In [None]:
filtered_df.to_csv('data/processed_data.csv')

### End of Notebook
(c) 2024 Warren Bebbington 