# Task 1: Manually complete a benchmark

I choose a task with instance_id = 44. Below are the details of the task.

**"task_inst"** (str): task goal description and output formatting instruction

Analyze the inertial measurement unit (IMU) data collected during sleep and compute sleep endpoints. Load the given data and compute the following sleep endpoints: time of falling asleep, time of awakening, and total duration spent sleeping. The three values should be saved in a JSON file "pred_results/imu_pred.json", and the keys for them are "sleep_onset", "wake_onset", and "total_sleep_duration", respectively.

**"github_name"** (str): the original github repository each task is adapted from

mad-lab-fau/BioPsyKit

**"domain_knowledge"** (str): expert-annotated information about the task

Using the function sleep_processing_pipeline.predict_pipeline_acceleration() in BioPsyKit to perform the sleep processing pipeline. BioPsyKit is a Python package for the analysis of biopsychological data.

**"dataset_folder_tree"** (str): string representation of dataset directory structure for each task

|-- sleep_imu_data/   
|---- sleep_data.pkl

**"src_file_or_path" (str)**: source program location in the original github repository that is adapted

examples/Sleep_IMU_Example.ipynb

**"gold_program_name"**(str): name of annotated program (reference solution) for each task

imu.py

**"output_fname"** (str): output location to save the generated program for each task

pred_results/imu_pred.json

**"dataset_preview"** (str): string representation of the first few examples/lines in dataset files used in each task

[START Preview of sleep_imu_data/sleep_data.pkl]
time        gyr_x        gyr_y        gyr_z        acc_x        acc_y        acc_z
2019-09-03 02:06:18.251953+02:00        -0.1220703125        -0.244140625        0.244140625        3.7553906250000004        -6.552773437500001        6.121669921875
2019-09-03 02:06:18.256836+02:00        -0.42724609375        -0.1220703125        0.1220703125        3.7122802734375004        -6.5958837890625        6.140830078125
2019-09-03 02:06:18.261718+02:00        -0.42724609375        -0.3662109375        0.18310546875        3.7122802734375004        -6.557563476562501        6.054609375
...
[END Preview of sleep_imu_data/sleep_data.pkl]



When I get the task, I find the dataset to be analysed. Then I look up how to load a .pkl file.

In [1]:
# load .pkl file
import pickle


In [2]:
file_path = 'sleep_imu_data\sleep_data.pkl'  # Replace with the actual path to your file
with open(file_path, 'rb') as file:
    # Load the data
    data = pickle.load(file)

In [3]:
data.head()

Unnamed: 0_level_0,gyr_x,gyr_y,gyr_z,acc_x,acc_y,acc_z
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-09-03 02:06:18.251953+02:00,-0.12207,-0.244141,0.244141,3.755391,-6.552773,6.12167
2019-09-03 02:06:18.256836+02:00,-0.427246,-0.12207,0.12207,3.71228,-6.595884,6.14083
2019-09-03 02:06:18.261718+02:00,-0.427246,-0.366211,0.183105,3.71228,-6.557563,6.054609
2019-09-03 02:06:18.266601+02:00,-0.488281,-0.183105,0.12207,3.750601,-6.567144,6.09772
2019-09-03 02:06:18.271484+02:00,-0.732422,0.244141,0.0,3.73144,-6.547983,6.11688


In [4]:
data.shape

(4048332, 6)

When I look at this dataset, I have no idea what the meaning of each column is. So I search for the description os the inertial measurement unit (IMU) data. Wikipedia tells that an inertial measurement unit (IMU) is an electronic device that measures and reports a body's specific force, angular rate, and sometimes the orientation of the body, using a combination of accelerometers, gyroscopes, and sometimes magnetometers.   

Then I search for how to handle IMU data in computing sleep endpoints. I find [this repo](https://github.com/mad-lab-fau/BioPsyKit) has a solution. The author create a Python package for the analysis of biopsychological data, including algorithms and data processing pipelines for sleep/wake prediction and computation of sleep endpoints based on activity or IMU data. So I pip install biopsykit.

In the document, I also know that an inertial measurement unit (IMU) is a sensor that measures a body's acceleration (using accelerometers) and angular rate (using gyroscopes). In medical and psychological applications IMUs are commonly used for activity monitoring, movement analysis, and many more.

Now I want to compute the following sleep endpoints: time of falling asleep, time of awakening, and total duration spent sleeping. By reading the document of biopsykit, I find the [Sleep_IMU_Example.ipynb](https://github.com/mad-lab-fau/BioPsyKit/blob/main/examples/Sleep_IMU_Example.ipynb), which contains detailed processing of IMU data to predict sleep/wake.

Here is the code to complete the task. 

In [5]:
# pip install biopsykit

In [6]:
import biopsykit as bp
import json

  from .autonotebook import tqdm as notebook_tqdm


In [7]:
# get the fs value
data_example, fs = bp.example_data.get_sleep_imu_example()
fs

204.8

In [8]:
# Compute features
sleep_results = bp.sleep.sleep_processing_pipeline.predict_pipeline_acceleration(
    data_example, sampling_rate=fs
)
sleep_endpoints = sleep_results["sleep_endpoints"]


sleep_wake
1.0           243
0.0           152
Name: count, dtype: int64


After run the example code given from the contributor of biopsykit, I think it would work for the data given by the author of ScienceAgentBench

In [9]:
# Compute features
sleep_results = bp.sleep.sleep_processing_pipeline.predict_pipeline_acceleration(
    data, sampling_rate=fs
)
sleep_endpoints = sleep_results["sleep_endpoints"]


OutOfBoundsDatetime: cannot convert input with unit 'us'

Then I encounter an error. It seems that something wrong in the unit of input data. So I check if there exists difference in the datetime index format between the data and data_example.

In [10]:
data_example.index

DatetimeIndex(['2019-09-03 01:01:12.001953+02:00',
               '2019-09-03 01:01:12.006836+02:00',
               '2019-09-03 01:01:12.011718+02:00',
               '2019-09-03 01:01:12.016601+02:00',
               '2019-09-03 01:01:12.021484+02:00',
               '2019-09-03 01:01:12.026367+02:00',
               '2019-09-03 01:01:12.031250+02:00',
               '2019-09-03 01:01:12.036132+02:00',
               '2019-09-03 01:01:12.041015+02:00',
               '2019-09-03 01:01:12.045898+02:00',
               ...
               '2019-09-03 07:35:45.449218+02:00',
               '2019-09-03 07:35:45.454101+02:00',
               '2019-09-03 07:35:45.458984+02:00',
               '2019-09-03 07:35:45.463867+02:00',
               '2019-09-03 07:35:45.468750+02:00',
               '2019-09-03 07:35:45.473632+02:00',
               '2019-09-03 07:35:45.478515+02:00',
               '2019-09-03 07:35:45.483398+02:00',
               '2019-09-03 07:35:45.488281+02:00',
            

In [11]:
data.index

DatetimeIndex(['2019-09-03 02:06:18.251953+02:00',
               '2019-09-03 02:06:18.256836+02:00',
               '2019-09-03 02:06:18.261718+02:00',
               '2019-09-03 02:06:18.266601+02:00',
               '2019-09-03 02:06:18.271484+02:00',
               '2019-09-03 02:06:18.276367+02:00',
               '2019-09-03 02:06:18.281250+02:00',
               '2019-09-03 02:06:18.286132+02:00',
               '2019-09-03 02:06:18.291015+02:00',
               '2019-09-03 02:06:18.295898+02:00',
               ...
               '2019-09-03 07:35:45.449218+02:00',
               '2019-09-03 07:35:45.454101+02:00',
               '2019-09-03 07:35:45.458984+02:00',
               '2019-09-03 07:35:45.463867+02:00',
               '2019-09-03 07:35:45.468750+02:00',
               '2019-09-03 07:35:45.473632+02:00',
               '2019-09-03 07:35:45.478515+02:00',
               '2019-09-03 07:35:45.483398+02:00',
               '2019-09-03 07:35:45.488281+02:00',
            

So the problem lies in the fact that the units of the index in the DataFrame are not in microseconds.

Then, I search for "DatetimeIndex unit change" in Google. Below is how to convert the DatetimeIndex.

In [12]:
data.index = data.index.as_unit('us')

In [13]:
data.index

DatetimeIndex(['2019-09-03 02:06:18.251953+02:00',
               '2019-09-03 02:06:18.256836+02:00',
               '2019-09-03 02:06:18.261718+02:00',
               '2019-09-03 02:06:18.266601+02:00',
               '2019-09-03 02:06:18.271484+02:00',
               '2019-09-03 02:06:18.276367+02:00',
               '2019-09-03 02:06:18.281250+02:00',
               '2019-09-03 02:06:18.286132+02:00',
               '2019-09-03 02:06:18.291015+02:00',
               '2019-09-03 02:06:18.295898+02:00',
               ...
               '2019-09-03 07:35:45.449218+02:00',
               '2019-09-03 07:35:45.454101+02:00',
               '2019-09-03 07:35:45.458984+02:00',
               '2019-09-03 07:35:45.463867+02:00',
               '2019-09-03 07:35:45.468750+02:00',
               '2019-09-03 07:35:45.473632+02:00',
               '2019-09-03 07:35:45.478515+02:00',
               '2019-09-03 07:35:45.483398+02:00',
               '2019-09-03 07:35:45.488281+02:00',
            

Then I run the pipeline to see if it works.

In [14]:
# Compute features
sleep_results = bp.sleep.sleep_processing_pipeline.predict_pipeline_acceleration(
    data, sampling_rate=fs
)
sleep_endpoints = sleep_results["sleep_endpoints"]


sleep_wake
1.0           188
0.0           142
Name: count, dtype: int64


In [15]:
results = {
        "sleep_onset": sleep_endpoints["sleep_onset"],
        "wake_onset": sleep_endpoints["wake_onset"],
        "total_sleep_duration": sleep_endpoints["total_sleep_duration"],
    }


In [16]:
import os

In [17]:
# Create folder if it doesn't exist
os.makedirs("pred_results", exist_ok=True)

with open("pred_results/imu_pred.json", "w", encoding="utf-8") as f:
    json.dump(results, f)

The total time taken was about 90 minutes.