# Export EDF

This notebook demonstrates the process of exporting DiveDB data as an EDF file.

While under development, it also contains the prototype (non-library) code; that'll be deleted when this notebook is ready to be merged into the main branch.

Punch list:
- [x] Make a list
- [x] Understand task :) 
- [x] Prototype:
    - [x] Load basic metadata
    - [x] Load signals
    - [x] Generate EDF file 
        - [X] Can mne serve our needs here? Check if multiple sample rates, arbitrary metadata: edfio can!
        - [x] Decide if different library OR extend mne: use edfio, which is what mne depends on 
    - [x] Test EDF file can be opened externally (e.g. through EDF.jl or other app)
    - [x] Test EDF encodes max/min values
    - [x] Get recording metadata
- [x] Turn prototype into library code
- [x] Add metadata to EDF header
- [ ] Add tests
- [ ] Make note of edge case tests to file
- [ ] Clean up this notebook (delete this punch list!)
- [ ] Mark PR ready for review

Reminder: this is the end goal

```python
# Example of usage once complete

from DiveDB.services.duck_pond import DuckPond

duckpond = DuckPond(os.environ["CONTAINER_DELTA_LAKE_PATH"])

dive_data = duckpond.get_delta_data(    
    labels=["eeg"],
    animal_ids="apfo-001a",
)

# dive_data.export_to_edf("path_to_output.edf") # ORIGINALLY took in filepath, adjusted to be directory
dive_data.export_to_edf("path_to_output_dir")

```

### Prototype

In [1]:
# Test previous behavior holds
import os
import importlib
import DiveDB.services.duck_pond as dp
importlib.reload(dp)

duckpond = dp.DuckPond(os.environ["CONTAINER_DELTA_LAKE_PATH"])

# Example from the querying_docs notebook
data = duckpond.get_delta_data(    
    labels=["derived_data_depth"],
    animal_ids="apfo-001a",
    frequency=1/60,  # Once a minute
)
display(data)


Unnamed: 0,datetime,derived_data_depth
0,2019-11-07 19:50:00+00:00,-1.982753
1,2019-11-07 19:51:00+00:00,-1.900138
2,2019-11-07 19:52:00+00:00,-1.660804
3,2019-11-07 19:53:00+00:00,-1.362733
4,2019-11-07 19:54:00+00:00,-1.074403
...,...,...
173,2019-11-07 22:43:00+00:00,-0.466027
174,2019-11-07 22:44:00+00:00,-0.794010
175,2019-11-07 22:45:00+00:00,-1.119214
176,2019-11-07 22:46:00+00:00,-1.418781


In [2]:

# Okay, but is there a way to find out what animal_ids, etc, are available?
# Time to go spelunking!
duckpond.get_db_schema()

# ...okay, cool. :) 

┌──────────┬─────────┬────────────────────────────┬─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬───────────┐
│ database │ schema  │            name            │                                                                                                                                                          column_names                                                                                                                                                           │              

In [3]:
# Let's try a sql query as well (also ripped from the querying_docs notebook)
labels_df = duckpond.conn.sql(f"""
    SELECT label
    FROM (
        SELECT DISTINCT label
        FROM DataLake
    )
""").df()
# display(labels_df)

animals_df = duckpond.conn.sql(f"""
    SELECT animal
    FROM (
        SELECT DISTINCT animal
        FROM DataLake
    )
""").df()
# display(animals_df)

signal_df = duckpond.conn.sql(f"""
    SELECT class, label
    FROM (
        SELECT DISTINCT label, class
        FROM DataLake
    )
""").df()
type(signal_df)


pandas.core.frame.DataFrame

In [4]:
import DiveDB.services.dive_data as dd
importlib.reload(dd)
import DiveDB.services.duck_pond as dp
importlib.reload(dp)

duckpond = dp.DuckPond(os.environ["CONTAINER_DELTA_LAKE_PATH"])

results = duckpond.get_delta_data(    
    classes=["derived_data_depth", "sensor_data_accelerometer"],
    animal_ids="apfo-001a",
    # limit=1000, # 0000
)
results.get_metadata()
results.duckdb_relation.columns
results

FloatProgress(value=0.0, layout=Layout(width='auto'), style=ProgressStyle(bar_color='black'))

<DiveDB.services.dive_data.DiveData at 0xffff4a3157f0>

In [5]:
import DiveDB.services.dive_data as dd
importlib.reload(dd)

dd.get_metadata(results)

{'2019-11-08_apfo-001a_apfo-001a_CC-35': {'logger_id': 'CC-35',
  'animal_id': 'apfo-001a',
  'deployment_id': '2019-11-08_apfo-001a'}}

In [8]:
import DiveDB.services.dive_data as dd
importlib.reload(dd)

outpaths = dd.export_to_edf(results, ".tmp/foo")
outpaths


accelerometer-ax - From ax with class sensor_data_accelerometer
derived_depth - From derived_data_depth with class derived_data_depth
accelerometer-ay - From ay with class sensor_data_accelerometer
accelerometer-az - From az with class sensor_data_accelerometer


['.tmp/foo/2019-11-08_apfo-001a_apfo-001a_CC-35_4.edf']

In [10]:
from edfio import read_edf

edf_roundtrip = read_edf(outpaths[0])
# display(edf_roundtrip)
display(edf_roundtrip.signals)
display(edf_roundtrip.signals[0].__dict__)
display(edf_roundtrip.signals[1].__dict__)
# display(edf_roundtrip.signals[0].data)
# display(edf_roundtrip.recording)
# display(edf_roundtrip.patient)

(<EdfSignal accelerometer-ax 400Hz>,
 <EdfSignal derived_depth 50Hz>,
 <EdfSignal accelerometer-ay 400Hz>,
 <EdfSignal accelerometer-az 400Hz>)

{'_sampling_frequency': 400.0,
 '_label': b'accelerometer-ax',
 '_transducer_type': b'TODO                                                                            ',
 '_physical_dimension': b'Unknown ',
 '_physical_min': b'-39.2266',
 '_physical_max': b'39.22541',
 '_digital_min': b'-32768  ',
 '_digital_max': b'32767   ',
 '_prefiltering': b'sensor                                                                          ',
 '_samples_per_data_record': b'400     ',
 '_reserved': b'                                ',
 '_lazy_loader': <edfio._lazy_loading.LazyLoader at 0xffff15df7380>}

{'_sampling_frequency': 50.0,
 '_label': b'derived_depth   ',
 '_transducer_type': b'TODO                                                                            ',
 '_physical_dimension': b'Unknown ',
 '_physical_min': b'-2.00532',
 '_physical_max': b'122.1133',
 '_digital_min': b'-32768  ',
 '_digital_max': b'32767   ',
 '_prefiltering': b'derived                                                                         ',
 '_samples_per_data_record': b'50      ',
 '_reserved': b'                                ',
 '_lazy_loader': <edfio._lazy_loading.LazyLoader at 0xffff15df7e30>}

In [16]:
"Foxes"[0:-3]

'Fo'