# Data Uploader

This notebook demonstrates the process of uploading EDF files data to a database and Delta Lake storage. 
It includes the setup and execution of the data upload process, as well as querying the uploaded data for analysis.

### To upload data:
The `edf_file_paths` list contains the paths to the EDF files that we want to upload. 
These files are located in the `../data/files/` directory and are named `test12_Wednesday_05_DAY1_PROCESSED.edf` and `test12_Wednesday_05_DAY2_PROCESSED.edf`.

The `metadata_file_path` variable holds the path to the CSV file containing metadata for the EDF files. 
This file is also located in the `../data/files/` directory and is named `Sleep Study Metadata.csv`.
The `metadata_map` dictionary is used to map the columns in the CSV metadata file to the corresponding mode. 
The keys in the dictionary represent the fields in the database, and the values represent the column names in the CSV file.
For example:
    - "animal" maps to the "Nickname" column in the CSV file.
    - "deployment" maps to the "Deployment" column in the CSV file.
    - "logger" maps to the "Logger Used" column in the CSV file.
    - "recording" maps to the "Recording ID" column in the CSV file.


In [1]:
import os

os.environ["DJANGO_ALLOW_ASYNC_UNSAFE"] = "true"

import importlib
import services.data_uploader
import services.metadata_manager
importlib.reload(services.data_uploader)
importlib.reload(services.metadata_manager)
from services.data_uploader import DataUploader

data_uploader = DataUploader()

edf_file_paths = [
    "../data/files/test12_Wednesday_05_DAY1_PROCESSED.edf",
    "../data/files/test12_Wednesday_05_DAY2_PROCESSED.edf"
]

metadata_file_path = "../data/files/Sleep Study Metadata.csv"

metadata_map = {
    "animal": "Nickname",
    "deployment": "Deployment",
    "logger": "Logger Used",
    "recording": "Recording ID"
}

# data_uploader.upload_edf(edf_file_paths, metadata_file_path, metadata_map)

Connecting to DuckDB
Installing PostgreSQL extension
Loading PostgreSQL extension
Connecting to PostgreSQL
postgresql://divedbuser:divedbpassword@postgres:postgres:5432/divedb
Connecting to DuckDB
Installing PostgreSQL extension
Loading PostgreSQL extension
Connecting to PostgreSQL
postgresql://divedbuser:divedbpassword@postgres:postgres:5432/divedb


In [3]:
from services.duck_pond import DuckPond

duckpond = DuckPond()

query = f"""
SELECT 
    MIN(data) as min_odba,
    MAX(data) as max_odba,
    AVG(data) as avg_odba
FROM 
    delta_scan('{os.environ["CONTAINER_DELTA_LAKE_PATH"]}')
WHERE 
    signal_name = 'ODBA';
"""

df = duckpond.conn.execute(query).pl()
display(df)

min_odba,max_odba,avg_odba
f64,f64,f64
0.001447,48.607994,0.283147


In [8]:
query = f"""
SELECT 
    data
FROM 
    delta_scan('{os.environ["CONTAINER_DELTA_LAKE_PATH"]}')
WHERE signal_name = 'ECG_ICA2'

"""

df = duckpond.conn.execute(query).pl()
display(df)

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

data
f64
10.927123
14.70782
11.872297
17.070756
9.036774
…
36.919417
35.974243
47.788922
58.658427


In [11]:
from pyologger.process_data.feature_generation_utils import get_heart_rate

query = f"""
SELECT 
    data
FROM 
    delta_scan('{os.environ["CONTAINER_DELTA_LAKE_PATH"]}')
WHERE 
    signal_name = 'ECG_ICA2'
LIMIT 1000000;
"""

df = duckpond.conn.execute(query).pl()
display(df)

get_heart_rate(df["data"])

data
f64
10.927123
14.70782
11.872297
17.070756
9.036774
…
71.890867
62.439124
62.911711
65.747234


Filled 7 bad heart rate values


0         0.0
1         0.0
2         0.0
3         0.0
4         0.0
         ... 
999995    0.0
999996    0.0
999997    0.0
999998    0.0
999999    0.0
Length: 1000000, dtype: float64