# Data Uploader

This notebook demonstrates the process of uploading EDF files data to a database and Delta Lake storage. 
It includes the setup and execution of the data upload process, as well as querying the uploaded data for analysis.

### To upload data:
The `edf_file_paths` list contains the paths to the EDF files that we want to upload. 
These files are located in the `../data/files/` directory and are named `test12_Wednesday_05_DAY1_PROCESSED.edf` and `test12_Wednesday_05_DAY2_PROCESSED.edf`.

The `metadata_file_path` variable holds the path to the CSV file containing metadata for the EDF files. 
This file is also located in the `../data/files/` directory and is named `Sleep Study Metadata.csv`.
The `metadata_map` dictionary is used to map the columns in the CSV metadata file to the corresponding mode. 
The keys in the dictionary represent the fields in the database, and the values represent the column names in the CSV file.
For example:
    - "animal" maps to the "Nickname" column in the CSV file.
    - "deployment" maps to the "Deployment" column in the CSV file.
    - "logger" maps to the "Logger Used" column in the CSV file.
    - "recording" maps to the "Recording ID" column in the CSV file.


In [42]:
import os

os.environ["DJANGO_ALLOW_ASYNC_UNSAFE"] = "true"

import shutil
import importlib

import services.data_uploader
importlib.reload(services.data_uploader)
from services.data_uploader import DataUploader

data_uploader = DataUploader()

edf_file_paths = [
    "../data/files/test12_Wednesday_05_DAY1_PROCESSED.edf",
    "../data/files/test12_Wednesday_05_DAY2_PROCESSED.edf"
]

metadata_file_path = "../data/files/Sleep Study Metadata.csv"

metadata_map = {
    "animal": "Nickname",
    "deployment": "Deployment",
    "logger": "Logger Used",
    "recording": "Recording ID"
}


# Delete directory at os.environ["CONTAINER_DELTA_LAKE_PATH"]
delta_lake_path = os.environ.get("CONTAINER_DELTA_LAKE_PATH")
if delta_lake_path and os.path.exists(delta_lake_path):
    shutil.rmtree(delta_lake_path)
    print(f"Deleted directory: {delta_lake_path}")
else:
    print(f"Directory does not exist: {delta_lake_path}")



data_uploader.upload_edf(edf_file_paths, metadata_file_path, metadata_map)

Directory does not exist: /data/delta-lake
Uploading ../data/files/test12_Wednesday_05_DAY1_PROCESSED.edf to Swift
Uploading ../data/files/test12_Wednesday_05_DAY2_PROCESSED.edf to Swift
Processing 36 signals in 2 files.


Processing signals: 100%|██████████| 36/36 [11:42<00:00, 19.51s/it]

Upload complete.





In [3]:
import os
import importlib
import services.duck_pond
import services.utils.edf
importlib.reload(services.duck_pond)
importlib.reload(services.utils.edf)

from services.duck_pond import DuckPond
from services.utils.edf import create_mne_array, create_mne_edf

duckpond = DuckPond()

df = duckpond.conn.sql("SELECT 'data' FROM DeltaLake where signal_name = 'ECG_ICA2'")

# df = duckpond.get_delta_data(
#     signal_names=["ECG_ICA2", "EEG_ICA5"],
#     date_range=(
#         "2019-10-26 14:46:21.008", 
#         "2019-10-26 14:46:21.008"
#     )
# )

print(df.pl())
# raw = create_mne_edf(df, "/data/test.edf")


shape: (86_400_500, 1)
┌────────┐
│ 'data' │
│ ---    │
│ str    │
╞════════╡
│ data   │
│ data   │
│ data   │
│ data   │
│ data   │
│ …      │
│ data   │
│ data   │
│ data   │
│ data   │
│ data   │
└────────┘


In [None]:
from pyologger.process_data.feature_generation_utils import get_heart_rate

query = f"""
SELECT data
FROM DeltaLake
WHERE signal_name = 'ECG_ICA2'
LIMIT 5000000;
"""

df = duckpond.conn.execute(query).pl()
display(df)

heart_rate = get_heart_rate(df["data"])

heart_rate.max()

data
f64
-142.663706
-142.663706
-142.663706
-142.663706
-142.663706
…
304.876341
116.786648
-112.890715
-423.380484


Filled 9 bad heart rate values


198.67549668874173