# Export EDF

This notebook demonstrates the process of exporting DiveDB data as an EDF file.

While under development, it also contains the prototype (non-library) code; that'll be deleted when this notebook is ready to be merged into the main branch.

Punch list:
- [x] Make a list
- [x] Understand task :) 
- [x] Prototype:
    - [x] Load basic metadata
    - [x] Load signals
    - [x] Generate EDF file 
        - [X] Can mne serve our needs here? Check if multiple sample rates, arbitrary metadata: edfio can!
        - [x] Decide if different library OR extend mne: use edfio, which is what mne depends on 
    - [x] Test EDF file can be opened externally (e.g. through EDF.jl or other app)
    - [x] Test EDF encodes max/min values
    - [x] Get recording metadata
- [x] Turn prototype into library code
- [ ] Add metadata to EDF header
- [ ] Add tests
- [ ] Write up edge case tests
    - [ ] Make 'em pass OR file 'em
- [ ] Clean up this notebook (delete this punch list!)
- [ ] Mark PR ready for review

Reminder: this is the end goal

```python
# Example of usage once complete

from DiveDB.services.duck_pond import DuckPond

duckpond = DuckPond(os.environ["CONTAINER_DELTA_LAKE_PATH"])

dive_data = duckpond.get_delta_data(    
    labels=["eeg"],
    animal_ids="apfo-001a",
)

# dive_data.export_to_edf("path_to_output.edf") # ORIGINALLY took in filepath, adjusted to be directory
dive_data.export_to_edf("path_to_output_dir")

```

### Prototype

In [None]:
# Test previous behavior holds
import os
import importlib
import DiveDB.services.duck_pond as dp
importlib.reload(dp)

duckpond = dp.DuckPond(os.environ["CONTAINER_DELTA_LAKE_PATH"])

# Example from the querying_docs notebook
data = duckpond.get_delta_data(    
    labels=["derived_data_depth"],
    animal_ids="apfo-001a",
    frequency=1/60,  # Once a minute
)
display(data)


In [None]:

# Okay, but is there a way to find out what animal_ids, etc, are available?
# Time to go spelunking!
duckpond.get_db_schema()

# ...okay, cool. :) 

In [None]:
# Let's try a sql query as well (also ripped from the querying_docs notebook)
labels_df = duckpond.conn.sql(f"""
    SELECT label
    FROM (
        SELECT DISTINCT label
        FROM DataLake
    )
""").df()
# display(labels_df)

animals_df = duckpond.conn.sql(f"""
    SELECT animal
    FROM (
        SELECT DISTINCT animal
        FROM DataLake
    )
""").df()
# display(animals_df)

signal_df = duckpond.conn.sql(f"""
    SELECT class, label
    FROM (
        SELECT DISTINCT label, class
        FROM DataLake
    )
""").df()
type(signal_df)


In [None]:
import DiveDB.services.dive_data as dd
importlib.reload(dd)
import DiveDB.services.duck_pond as dp
importlib.reload(dp)

duckpond = dp.DuckPond(os.environ["CONTAINER_DELTA_LAKE_PATH"])

results = duckpond.get_delta_data(    
    classes=["derived_data_depth", "sensor_data_accelerometer"],
    animal_ids="apfo-001a",
    # limit=1000, # 0000
)
results.get_metadata()
results.duckdb_relation.columns
results

In [None]:
import DiveDB.services.dive_data as dd
importlib.reload(dd)

outpaths = dd.export_to_edf(results, ".tmp/foo")
outpaths


In [None]:
from pandas import Timestamp
import DiveDB.services.dive_data as dd
importlib.reload(dd)
df = results.duckdb_relation.df()
v = df.sort_values('datetime')['datetime']
v.size

In [None]:
from edfio import read_edf

edf_roundtrip = read_edf(outpaths[0])
display(edf_roundtrip.signals)
display(edf_roundtrip.signals[0].data)
edf_roundtrip.recording

In [None]:
import DiveDB.services.dive_data as dd
importlib.reload(dd)

m = dd.get_metadata(results)
print(m)

In [None]:
demo_metadata_df = duckpond.conn.sql("""
    SELECT *
    FROM Metadata.public.Recordings
    WHERE Recordings.id = '2019-11-08_apfo-001a_apfo-001a_CC-35'
""").df()
display(demo_metadata_df)