# Checking the accuracy of sleep details by two different sleep analysis devices

author: A.Norouzi

The research questions you should answer in this assignment are:
- Do the data received from devices withing sleep analyser and fitbit smartwatch charge 5 overlap for different sleep stages?

In [1]:
# Essential libraries for data manipulation and analysis
import numpy as np
import pandas as pd
import yaml

from datetime import datetime
from datetime import timedelta

import openpyxl
import os

# own modules
import stats_functions as sf

import cv2

import panel as pn
from bokeh.plotting import figure
from bokeh.transform import factor_cmap
from bokeh.palettes import Spectral10

from PIL import Image

pn.extension()

In this section, data related to 'Withing Sleep Analyzer' device  is extracted from the .CSV file. This is achieved by placing the file in the 'data' folder. Additionally, a column is added to the Data, indicating the name of the file, which serves as the code for the person being tested.

In [2]:
# Function to read the configuration file
def get_config():
    with open('config.yaml', 'r') as df_file:
        config = yaml.safe_load(df_file)
    return config



# Load configuration
config = get_config()

# Assuming your YAML file provides a folder path under a key
folder_path = config['sleep']

# Load data into a dataframe
main_withing_df = sf.load_data_to_dict(folder_path)

withing_visual = main_withing_df.copy()
main_withing_df



Unnamed: 0,start,duration,value,Person
0,2024-01-10T21:57:00+01:00,"[60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,...","[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1...",P1
1,2024-01-11T00:19:00+01:00,"[60,60,60]","[0,0,0]",P1
2,2024-01-11T00:27:00+01:00,"[60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,...","[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1...",P1
3,2024-01-11T03:27:00+01:00,"[60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,...","[1,1,1,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3...",P1
4,2024-01-11T06:27:00+01:00,"[60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,...","[3,3,3,3,3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1...",P1
...,...,...,...,...
639,2023-12-24T02:13:00+01:00,"[60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,...","[0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1...",P2
640,2023-12-24T03:59:00+01:00,"[60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,...","[3,3,3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1...",P2
641,2023-12-24T06:59:00+01:00,"[60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,...","[2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,1,1,1,1,1...",P2
642,2023-12-24T09:59:00+01:00,"[60,60,60,60,60,60,60,60,60,60,60,60,60,60,60,...","[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1...",P2


## Inspect the Withing data

In [3]:
main_withing_df.shape

(644, 4)

In [4]:
main_withing_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 644 entries, 0 to 643
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   start     644 non-null    object
 1   duration  644 non-null    object
 2   value     644 non-null    object
 3   Person    644 non-null    object
dtypes: object(4)
memory usage: 20.3+ KB


In this data frame, data pertaining to various sleep states is displayed in the 'Value' column, and the duration of these states, measured in seconds, is presented in the 'Duration' column.

various states in whiting data:
0 -> awake
1 -> light sleep
2 -> deep sleep
3 -> rem


In next sections, the withing data is transformed and organized into a format suitable for visual presentation and data analysis.

In [5]:

#changing the code to datetime format
withing_visual['start'] = pd.to_datetime(withing_visual['start'])
withing_visual = withing_visual.sort_values(by='start')

#remove the duplicate rows
withing_visual = withing_visual.drop_duplicates(subset='start', keep='first')

#sorting 'start' column data

withing_visual = withing_visual.reset_index(drop=True)


# Apply this function to each element in the 'duration' column
withing_visual['duration'] = withing_visual['duration'].apply(sf.count_elements)
withing_visual['value'] = withing_visual['value'].apply(sf.separate_elements)


withing_visual = withing_visual.explode('value')
time_increments = pd.to_timedelta(withing_visual.groupby(level=0).cumcount(), unit='minute')

# Adjust the 'start' time for each row by adding the time increments
withing_visual['start'] = withing_visual['start'] + time_increments

#reset index
withing_visual=withing_visual.reset_index(drop=True)

# Make a Date column
withing_visual['Date'] = withing_visual['start'].dt.date

# Change the column names
withing_visual = withing_visual.rename(columns = {'start' :'Datetime', 'value':'Value', 'duration':'Duration'})

# change datetime from minute to second
withing_visual = withing_visual.loc[withing_visual.index.repeat(60)].reset_index(drop=True)
withing_visual['Datetime'] = withing_visual.groupby(withing_visual.index // 60)['Datetime'].transform(lambda x: x + pd.to_timedelta(range(len(x)), unit='s'))

# make stage columns
replacement_whiting_app = {'1': '2', '2': '1', '0': '4'}
withing_visual['Value'] = withing_visual['Value'].replace(replacement_whiting_app)
Withing_Value_Description = {'1': 'DEEP', '2': 'LIGHT','3': 'REM','4': 'AWAKE'}
withing_visual['Sleep Stage'] = withing_visual['Value'].map(Withing_Value_Description)

data = withing_visual.copy()
withing_visual


Unnamed: 0,Datetime,Duration,Value,Person,Date,Sleep Stage
0,2023-12-23 21:59:00+01:00,9,4,P2,2023-12-23,AWAKE
1,2023-12-23 21:59:01+01:00,9,4,P2,2023-12-23,AWAKE
2,2023-12-23 21:59:02+01:00,9,4,P2,2023-12-23,AWAKE
3,2023-12-23 21:59:03+01:00,9,4,P2,2023-12-23,AWAKE
4,2023-12-23 21:59:04+01:00,9,4,P2,2023-12-23,AWAKE
...,...,...,...,...,...,...
1427635,2024-01-15 10:17:55+01:00,174,3,P2,2024-01-15,REM
1427636,2024-01-15 10:17:56+01:00,174,3,P2,2024-01-15,REM
1427637,2024-01-15 10:17:57+01:00,174,3,P2,2024-01-15,REM
1427638,2024-01-15 10:17:58+01:00,174,3,P2,2024-01-15,REM


Now, withing data is prepared for processing and will be compared with application data for validation in the next step.

In [6]:



# Function to create a vbar plot for a given date
def create_vbar_plot(selected_date, selected_person):
    df = data[(data['Date'] == selected_date) & (data['Person'] == selected_person)]
    plot = figure(x_axis_type='datetime', title=f'Values on {selected_date} for {selected_person}', 
                  sizing_mode='stretch_width', height=250, tools="")

    # Create a color mapper
    categories = sorted(df['Value'].unique())
    color_mapper = factor_cmap('Value', palette=Spectral10, factors=categories)

    # Create vbar plot
    plot.vbar(x='Datetime', top='Value', source=df, width=timedelta(seconds=1), 
              color=color_mapper, legend_field='Sleep Stage')

    plot.legend.title = 'Value'
    plot.legend.location = 'top_right'
    plot.legend.orientation = 'horizontal'

    return plot

# Prepare date options for the dropdown
date_options = sorted(data['Date'].unique())
person_options = ['P1', 'P2']
person_select = pn.widgets.Select(name='Select Person:', options=person_options, value=person_options[0])
date_select = pn.widgets.Select(name='Select Date:', options=[str(date) for date in date_options], value=str(date_options[0]))

# Interactive panel
@pn.depends(date_select.param.value, person_select.param.value)
def update_plot(date, person):
    return create_vbar_plot(pd.to_datetime(date).date(), person)

#image
image_path = config['img']
image = Image.open(image_path)
image_pane = pn.pane.JPG(image, width=170)

# Display the panel
dashboard = pn.Column(
    pn.Row(date_select, person_select,image_pane,),
    pn.Row(update_plot)
)

dashboard.servable()


BokehModel(combine_events=True, render_bundle={'docs_json': {'8f70d33e-ac17-4f15-8fe6-64bf62b7e96b': {'version…

In [7]:
def get_config():   
    with open('config.yaml', 'r') as df_c:
        config = yaml.safe_load(df_c)
    return config

config = get_config()
manual_df=pd.read_excel(config['fitbit-m-P1'])

manual_visual = manual_df.copy()
manual_df

Unnamed: 0,Date,Person,Start Time,End Time,Manual Value
0,2023-12-27,P1,22:59:30,23:04:00,AWAKE
1,2023-12-27,P1,23:04:00,23:07:30,LIGHT
2,2023-12-27,P1,23:07:30,23:14:30,AWAKE
3,2023-12-27,P1,23:14:30,23:42:00,LIGHT
4,2023-12-27,P1,23:42:00,23:55:00,DEEP
...,...,...,...,...,...
367,2024-01-11,P1,07:36:30,07:41:00,REM
368,2024-01-11,P1,07:41:00,08:03:30,LIGHT
369,2024-01-11,P1,08:03:30,08:33:30,DEEP
370,2024-01-11,P1,08:33:30,08:39:30,AWAKE


To verify this data, the relevant chart is compared with the chart in the application. Prior to comparison, the data is preprocessed for plotting:

In [8]:

# Map for manual values
Manual_Value_Description = {'DEEP':'1', 'LIGHT': '2','REM':'3' ,'AWAKE': '4'}

# Create a DataFrame for all time ranges
df_time_ranges = pd.DataFrame()

for index, row in manual_visual.iterrows():
    df_range = sf.create_time_range(row['Start Time'], row['End Time'], row['Date'], row['Person'], row['Manual Value'])
    df_time_ranges = pd.concat([df_time_ranges, df_range], ignore_index=True)

# Map the 'Manual Value' to 'Value'
df_time_ranges['Value'] = df_time_ranges['Manual Value'].map(Manual_Value_Description)
df_time_ranges['Date'] = df_time_ranges['Datetime'].dt.date

manual_visual_data = df_time_ranges.copy()
manual_visual_data


Unnamed: 0,Datetime,Manual Value,Person,Value,Date
0,2023-12-27 22:59:30,AWAKE,P1,4,2023-12-27
1,2023-12-27 22:59:31,AWAKE,P1,4,2023-12-27
2,2023-12-27 22:59:32,AWAKE,P1,4,2023-12-27
3,2023-12-27 22:59:33,AWAKE,P1,4,2023-12-27
4,2023-12-27 22:59:34,AWAKE,P1,4,2023-12-27
...,...,...,...,...,...
575040,2024-01-11 08:51:56,LIGHT,P1,2,2024-01-11
575041,2024-01-11 08:51:57,LIGHT,P1,2,2024-01-11
575042,2024-01-11 08:51:58,LIGHT,P1,2,2024-01-11
575043,2024-01-11 08:51:59,LIGHT,P1,2,2024-01-11


Comparing the data plot with the application's plot:

In [9]:

# Function to create a vbar plot for a given date
def create_vline_plot(selected_date, selected_person):
    df = df_time_ranges[(df_time_ranges['Date'] == selected_date) & (df_time_ranges['Person'] == selected_person)]
    plot = figure(x_axis_type='datetime', title=f'Values on {selected_date} for {selected_person}', 
                  sizing_mode='stretch_width', height=250, tools="")

    # Create a color mapper
    categories = sorted(df['Value'].unique())
    color_mapper = factor_cmap('Value', palette=Spectral10, factors=categories)

    # Create vbar plot
    # plot.vbar(x='Datetime', top='Value', source=df, width=timedelta(seconds=1), 
    #           color=color_mapper, legend_field='Manual Value')
    
    plot.line(x='Datetime', y='Value',line_width=2, source=df,
              color='blue')

    # Name the y-axis levels
    plot.yaxis.ticker = [1, 2, 3, 4]
    plot.yaxis.major_label_overrides = {1: 'Deep', 2: 'Light', 3: 'REM', 4: 'Awake'}

    return plot

# Prepare date options for the dropdown
date_options = sorted(manual_visual_data['Date'].unique())
person_options = ['P1', 'P2']
person_select = pn.widgets.Select(name='Select Person:', options=person_options, value=person_options[0])
date_select = pn.widgets.Select(name='Select Date:', options=[str(date) for date in date_options], value=str(date_options[0]))

# Interactive panel
@pn.depends(date_select.param.value, person_select.param.value)
def update_plot(date, person):
    return create_vline_plot(pd.to_datetime(date).date(), person)

#image
image_path = config['fitbit-img']
image = Image.open(image_path)
image_pane = pn.pane.PNG(image, width=900)

# Display the panel
dashboard = pn.Column(
    pn.Row(date_select, person_select),
    pn.Row(update_plot),
    pn.Row(image_pane)
)

# Display the panel
dashboard.servable()


BokehModel(combine_events=True, render_bundle={'docs_json': {'7b27cbd4-9f3b-46e7-9da0-9e3e51935341': {'version…

It is evident that the device information has been manually entered correctly. We can now proceed to examine additional information.
In this step, the manually entered data and the data from the waiting device are displayed on a plot for an initial comparison.

In [10]:
# copy dataframe so that make data change
statistic_fitbit_manual = manual_visual_data.copy()
statistic_withing = withing_visual.copy()


In [11]:
# 
statistic_withing = statistic_withing[statistic_withing['Person'] == 'P1']
statistic_withing.rename(columns={ 'Value': 'Value_withing', 'Sleep Stage': 'SleepStage_withing'}, inplace=True)
statistic_withing['Datetime'] = statistic_withing['Datetime'].dt.tz_localize(None)
statistic_withing = statistic_withing.drop(columns=['SleepStage_withing'])

statistic_fitbit_manual.rename(columns={ 'Value': 'Value_manual', 'Manual Value': 'SleepStage_manual'}, inplace=True)
statistic_fitbit_manual = statistic_fitbit_manual.drop(columns=['SleepStage_manual'])

merged_df = pd.merge(statistic_withing, statistic_fitbit_manual, on='Datetime', how='inner')
merged_df


Unnamed: 0,Datetime,Duration,Value_withing,Person_x,Date_x,Person_y,Value_manual,Date_y
0,2023-12-27 22:59:30,180,4,P1,2023-12-27,P1,4,2023-12-27
1,2023-12-27 22:59:31,180,4,P1,2023-12-27,P1,4,2023-12-27
2,2023-12-27 22:59:32,180,4,P1,2023-12-27,P1,4,2023-12-27
3,2023-12-27 22:59:33,180,4,P1,2023-12-27,P1,4,2023-12-27
4,2023-12-27 22:59:34,180,4,P1,2023-12-27,P1,4,2023-12-27
...,...,...,...,...,...,...,...,...
468247,2024-01-11 08:51:56,147,2,P1,2024-01-11,P1,2,2024-01-11
468248,2024-01-11 08:51:57,147,2,P1,2024-01-11,P1,2,2024-01-11
468249,2024-01-11 08:51:58,147,2,P1,2024-01-11,P1,2,2024-01-11
468250,2024-01-11 08:51:59,147,2,P1,2024-01-11,P1,2,2024-01-11


In [12]:

# Function to create a vbar plot for a given date
def create_vline2_plot(selected_date, selected_person):
    df = merged_df[(merged_df['Date_x'] == selected_date) & (merged_df['Person_x'] == selected_person)]
    plot = figure(x_axis_type='datetime', title=f'Values on {selected_date} for {selected_person}', 
                  sizing_mode='stretch_width', height=250, tools="")

    # Create a color mapper
    categories = sorted(df['Value_manual'].unique())
    color_mapper = factor_cmap('Value_manual', palette=Spectral10, factors=categories)

    # Create vbar plot
    # plot.vbar(x='Datetime', top='Value', source=df, width=timedelta(seconds=1), 
    #           color=color_mapper, legend_field='Manual Value')
    
    plot.line(x='Datetime', y='Value_withing',line_width=2, source=df,
              color='blue')

    plot.line(x='Datetime', y='Value_manual',line_width=2, source=df,
              color='red')

    # Name the y-axis levels
    plot.yaxis.ticker = [1, 2, 3, 4]
    plot.yaxis.major_label_overrides = {1: 'Deep', 2: 'Light', 3: 'REM', 4: 'Awake'}

    return plot

# Prepare date options for the dropdown
date_options = sorted(merged_df['Date_x'].unique())
person_options = ['P1', 'P2']
person_select = pn.widgets.Select(name='Select Person:', options=person_options, value=person_options[0])
date_select = pn.widgets.Select(name='Select Date:', options=[str(date) for date in date_options], value=str(date_options[0]))

# Interactive panel
@pn.depends(date_select.param.value, person_select.param.value)
def update_plot(date, person):
    return create_vline2_plot(pd.to_datetime(date).date(), person)



# Display the panel
dashboard = pn.Column(
    pn.Row(date_select, person_select),
    pn.Row(update_plot)
)

# Display the panel
dashboard.servable()


BokehModel(combine_events=True, render_bundle={'docs_json': {'a9167650-37c2-4d76-bc97-d72e361f093a': {'version…

In this step, we employed Method cohen_kappa_score to compare the two categories of information. The results are presented below.

In [13]:
from sklearn.metrics import cohen_kappa_score
# Group by day (extracting date from 'Datetime')
merged_df['Date'] = merged_df['Datetime'].dt.date
grouped = merged_df.groupby('Date')

# Calculate Cohen's Kappa for each group
kappa_scores = {}
for name, group in grouped:
    kappa = cohen_kappa_score(group['Value_withing'], group['Value_manual'])
    kappa_scores[name] = kappa

kappa_scores

  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)


{datetime.date(2023, 12, 27): 0.5716284959097035,
 datetime.date(2023, 12, 28): 0.23665998693869283,
 datetime.date(2023, 12, 29): 0.13921497067264077,
 datetime.date(2023, 12, 30): 0.024358562707859854,
 datetime.date(2023, 12, 31): 0.3979093068061551,
 datetime.date(2024, 1, 1): 0.19337647203980524,
 datetime.date(2024, 1, 2): 0.2033429107834196,
 datetime.date(2024, 1, 3): 0.08264476283882494,
 datetime.date(2024, 1, 4): 0.466878969526655,
 datetime.date(2024, 1, 5): 0.20454874709523174,
 datetime.date(2024, 1, 6): 0.37363909814766605,
 datetime.date(2024, 1, 7): 0.3675044463962901,
 datetime.date(2024, 1, 8): nan,
 datetime.date(2024, 1, 9): 0.3086384023642127,
 datetime.date(2024, 1, 10): 0.20777421252867367,
 datetime.date(2024, 1, 11): 0.24184459684190573}

| Kappa Value Range | Agreement Level     |
|-------------------|---------------------|
| \(\kappa \leq 0\)            | Less than chance agreement |
| \(0 < \kappa \leq 0.20\)     | Slight agreement          |
| \(0.21 < \kappa \leq 0.40\)  | Fair agreement            |
| \(0.41 < \kappa \leq 0.60\)  | Moderate agreement        |
| \(0.61 < \kappa \leq 0.80\)  | Substantial agreement     |
| \(0.81 < \kappa \leq 1\)     | Almost perfect agreement  |


Applying these guidelines to your results:

Moderate Agreement:

2023-12-27: 0.57
2023-12-31: 0.40 (close to moderate)
2024-1-4: 0.47
2024-1-6: 0.37 (close to moderate)
2024-1-7: 0.37 (close to moderate)
Fair Agreement:

2023-12-28: 0.24
2024-1-2: 0.20 (close to fair)
2024-1-5: 0.20 (close to fair)
2024-1-9: 0.31
2024-1-10: 0.21
2024-1-11: 0.24
Slight Agreement:

2023-12-29: 0.14
2023-12-30: 0.02
2024-1-1: 0.19
2024-1-3: 0.08
No Agreement (Calculation Not Possible):

2024-1-8: nan

The rationale for its use is that it enables the weighted comparison of two categories of information. This is particularly useful for ordered categories where some types of disagreement are more severe than others. in sleep stage analysis, a disagreement between "Light Sleep" and "REM Sleep" might be considered less severe than a disagreement between "Awake" and "Deep Sleep".

In [14]:
weights_type = 'linear'  # or 'quadratic' for quadratic weighting
weighted_kappa_scores = {}

for name, group in grouped:
    weighted_kappa = cohen_kappa_score(group['Value_withing'], group['Value_manual'], weights=weights_type)
    weighted_kappa_scores[name] = weighted_kappa

weighted_kappa_scores

  k = np.sum(w_mat * confusion) / np.sum(w_mat * expected)


{datetime.date(2023, 12, 27): 0.4721763744595161,
 datetime.date(2023, 12, 28): 0.2802891101613866,
 datetime.date(2023, 12, 29): 0.1524225541644023,
 datetime.date(2023, 12, 30): 0.04361875732402831,
 datetime.date(2023, 12, 31): 0.4601016355370574,
 datetime.date(2024, 1, 1): 0.1889881857448631,
 datetime.date(2024, 1, 2): 0.1522244100945087,
 datetime.date(2024, 1, 3): 0.0038792218039218618,
 datetime.date(2024, 1, 4): 0.525081177448308,
 datetime.date(2024, 1, 5): 0.25627309134294163,
 datetime.date(2024, 1, 6): 0.44349733602414776,
 datetime.date(2024, 1, 7): 0.3776363243678016,
 datetime.date(2024, 1, 8): nan,
 datetime.date(2024, 1, 9): 0.39456114965085454,
 datetime.date(2024, 1, 10): 0.22241020723330884,
 datetime.date(2024, 1, 11): 0.2626318714974073}

At this stage, our data shows a reasonable level of agreement. However, for more in-depth analysis, additional data was required. Manual extraction of this information was deemed practically impossible. As a result, it was decided to obtain the necessary data from the photo output data of the fitbit application cohorts. The process involves all groups sending me the device's output data in photo format, from which I can extract the data. The steps for this process are outlined below:

In [15]:
def list_png_files(folder_path):
    png_files = []
    for file in os.listdir(folder_path):
        if file.lower().endswith('.jpg'):
            png_files.append(os.path.splitext(file)[0])
    return png_files

def save_to_excel(file_names, excel_file):
    wb = openpyxl.Workbook()
    ws = wb.active
    ws.append(['File Names'])
    for name in file_names:
        ws.append([name])
    wb.save(excel_file)

if __name__ == "__main__":
    folder_path = "./data/P2"
    excel_file = "time_images0.xlsx"
    png_files = list_png_files(folder_path)
    save_to_excel(png_files, excel_file)

In [16]:

import cv2
import numpy as np
import pandas as pd
import os
import datetime
background_color = (78, 36, 37) 
# Function to check if a pixel matches the background color
def is_background(pixel, background_tolerance):
    return all(abs(pixel[i] - background_color[i]) <= background_tolerance for i in range(3))

# Function to find the Value intervals for a given Y position
def find_intervals(image, position, background_tolerance):
    intervals = []
    in_phase = False
    start = None
    for x in range(image.shape[1]):
        pixel = image[position, x]
        if is_background(pixel, background_tolerance):
            if in_phase:
                intervals.append((start + 2, x - 2))
                in_phase = False
        else:
            if not in_phase:
                start = x
                in_phase = True
    if in_phase:
        intervals.append((start, image.shape[1]))
    return intervals


# Function to process each image
def process_image(file_path, min_duration=0):
    # Load the image
    image = cv2.imread(file_path)
    file_name = os.path.basename(file_path)
    date_var = datetime.datetime.strptime(file_name.split('.')[0], "%Y-%m-%d").date()

    dir_path = os.path.dirname(file_path)
    person_name = os.path.basename(dir_path)

    if person_name == "P1":
            # Define Y positions
        positions = {
            "Awake": 427,
            "REM": 517,
            "Light": 602,
            "Deep": 693
        }
        background_tolerance = 8

    elif person_name == "P2":
        # Define Y positions
        positions = {
            "Awake": 379,
            "REM": 517,
            "Light": 653,
            "Deep": 795
        }
        background_tolerance = 12

    # Extract Value intervals for each position
    phase_intervals = {}
    for Value, position in positions.items():
        intervals = find_intervals(image, position, background_tolerance)
        phase_intervals[Value] = intervals

    # Load start and end times from the time_images.xlsx file
    time_excel = person_name + ".xlsx"
    time_df = pd.read_excel(time_excel)
    start_time_str = time_df.loc[time_df['File Names'] == os.path.splitext(file_name)[0], 'Start Time'].values[0]
    end_time_str = time_df.loc[time_df['File Names'] == os.path.splitext(file_name)[0], 'End Time'].values[0]

    # Convert start and end times to datetime objects
    start_time = pd.to_datetime(start_time_str, format="%H:%M:%S")
    end_time = pd.to_datetime(end_time_str, format="%H:%M:%S")
    delta_time = end_time - start_time

    if delta_time.days < 0:
        delta_time = pd.Timedelta(days=1) - abs(delta_time)

    # Create a dataframe to store the extracted data
    data = {"Value": [], "Date": [], "Person": [], "Start Time": [], "End Time": [], "Duration": []}

    # Draw the detected phases on the image
    detected_image = image.copy()
    colors = [(255, 0, 0), (0, 255, 0), (0, 0, 255), (255, 255, 0)]  # Colors for different phases

    for Value, intervals in phase_intervals.items():
        phase_index = list(positions.keys()).index(Value)
        for j, (start, end) in enumerate(intervals):

            # Filter intervals based on minimum duration
            if end - start < min_duration:
                continue

            # Calculate start and end time for this occurrence
            start_seconds = np.round((start - 44) * (delta_time).total_seconds() / (1080 - 44))
            end_seconds = np.round((end - 44) * (delta_time).total_seconds() / (1080 - 44))
            occurrence_start_time = start_time + pd.Timedelta(seconds=start_seconds)
            occurrence_end_time = start_time + pd.Timedelta(seconds=end_seconds)



            # Calculate duration for this occurrence
            duration_seconds = np.round((end - start) * (delta_time).total_seconds() / (1080 - 44))
            duration_time = pd.Timedelta(seconds=duration_seconds)
            duration_modified = (datetime.datetime.min + duration_time).time()

            occurrence_start_time = occurrence_start_time.time()
            occurrence_end_time = occurrence_end_time.time()



            # Store data
            data["Value"].append(Value)
            data["Date"].append(date_var)
            data["Person"].append(person_name)
            data["Start Time"].append(occurrence_start_time)
            data["End Time"].append(occurrence_end_time)
            data["Duration"].append(duration_modified)

            # Draw on image
            # cv2.putText(detected_image, f"{j+1}: {occurrence_start_time} - {occurrence_end_time} ({duration})",
            #             (start, positions[Value]), cv2.FONT_HERSHEY_SIMPLEX, 0.5, colors[phase_index], 2)
            cv2.line(detected_image, (start, positions[Value]), (end, positions[Value]), colors[phase_index], 1)

    # Convert data to DataFrame
    df = pd.DataFrame(data)

    # Sort DataFrame by start time
    df.sort_values(by='Start Time', inplace=True)
    # df.reset_index(drop=True, inplace=True)


    # Define input and output folders
    input_folder = "data"
    output_folder = "output_images"
    output_excel = "output_data.xlsx"
    # time_excel = "time_images.xlsx"

    # Save the image with detected phases
    output_image_path = os.path.join(output_folder, person_name, file_name)
    cv2.imwrite(output_image_path, detected_image)

    return df




def main():

        # Ensure xlsxwriter is installed
    try:
        import xlsxwriter
    except ImportError:
        print("Please install the xlsxwriter module: pip install xlsxwriter")
        exit()

# Define input and output folders
    input_folder = "data"
    output_folder = "output_images"
    output_excel = "output_data.xlsx"
    # time_excel = "time_images.xlsx"



    # Define background color range
    background_color = (78, 36, 37)  # BGR format
    # background_tolerance = 8


    subfolders = [os.path.join(input_folder, o) for o in os.listdir(input_folder) 
                  if os.path.isdir(os.path.join(input_folder,o))]
    main_fitbit_df = pd.DataFrame()
    for subfolder in subfolders:
        subfolder_name = os.path.basename(subfolder)  # Extract the subfolder name
            # Create output folder if it doesn't exist
        os.makedirs(os.path.join(output_folder, subfolder_name), exist_ok=True)
        for file in os.listdir(subfolder):
            if file.endswith(('.jpg', '.jpeg', '.png', '.PNG', '.bmp', '.gif')):
                file_path = os.path.join(subfolder, file)
                df = process_image(file_path, min_duration=5)  # Assume process_image is defined elsewhere
                if not df.empty:
                    main_fitbit_df = pd.concat([main_fitbit_df, df], ignore_index=True)
    print(main_fitbit_df.head(50))
    main_fitbit_df.to_excel(output_excel, index=False)


if __name__ == "__main__":
    main()


    Value        Date Person Start Time  End Time  Duration
0     REM  2023-12-28     P1   00:04:00  00:09:43  00:05:43
1   Awake  2023-12-28     P1   00:09:43  00:13:39  00:03:56
2   Light  2023-12-28     P1   00:13:39  00:35:47  00:22:09
3    Deep  2023-12-28     P1   00:35:47  00:41:30  00:05:43
4   Light  2023-12-28     P1   00:41:30  01:00:04  00:18:34
5     REM  2023-12-28     P1   01:00:04  01:11:51  00:11:47
6   Light  2023-12-28     P1   01:11:51  01:20:47  00:08:56
7     REM  2023-12-28     P1   01:20:47  01:42:56  00:22:09
8   Light  2023-12-28     P1   01:42:56  01:51:51  00:08:56
9    Deep  2023-12-28     P1   01:51:51  01:56:51  00:05:00
10  Light  2023-12-28     P1   01:57:13  02:51:30  00:54:17
11   Deep  2023-12-28     P1   02:51:30  03:05:26  00:13:56
12  Light  2023-12-28     P1   03:05:26  03:09:00  00:03:34
13   Deep  2023-12-28     P1   03:09:00  03:23:17  00:14:17
14  Light  2023-12-28     P1   03:23:39  04:40:26  01:16:47
15    REM  2023-12-28     P1   04:40:26 

At this stage, I successfully extracted data about myself from the photos. However, the pictures received from the consensus group was in varied formats and of poor quality, making it impossible to perform the same operation on their data. If we can reach out to the group members again, obtaining the same quality of photos might be possible, allowing us to advance the project.