# Loading Historical CNPL Data (ZStructs) with the neuraldecoding Dataset class
In this tutorial, you will learn the basics of loading data from the Chestek Lab with the Dataset class.
Also, serves as a fairly standard intro to working with datasets

First set up your imports:

In [None]:
import neuraldecoding.dataset as neuraldataset
from omegaconf import OmegaConf

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Step 1. Config Files
Like many things in neuraldecoding, we use a config file to define the parameters used to load a dataset. This is helpful for reproducibility, as you can use this config file as a record of what you ran (among other more robust logging features).

This config file also has a standard format, so it can be integrated into larger yaml files as a subconfiguration.
Every config file starts like this:

```yaml
dataset_type: <insert dataset type>
autoload: <bool>
save_path: null # or a filepath
dataset_parameters:
```

This will be the same for all data types. `dataset_type` will be used to call the correct code for loading and reformating the dataset your want. In our case, we want to use `zstruct`. `dataset_parameters` contains indented entries with all the parameters needed for your particular dataset type. A standard `zstruct` config looks like this:

```yaml
dataset_type: zstruct
autoload: True
save_path: null
dataset_parameters:
  overwrite: True
  experiment_type: monkey_emg_16_96
  server_dir: Z:\Data\Monkeys
  alt_filepath: null
  subject: Joker
  subject_id: "Monkey N"
  date: 2024-06-06
  run: 2
```
Let's load this config using OmegaConf (HYDRA EVENTUALLY?)

In [15]:
sample_config_file = OmegaConf.load("..\\example_configs\\datasets\\xpc_monkey_EMG.yaml")
print(sample_config_file)

{'dataset_type': 'zstruct', 'autoload': True, 'save_path': None, 'dataset_parameters': {'overwrite': False, 'experiment_type': 'monkey_emg_16_96', 'server_dir': 'Z:\\Data\\Monkeys', 'alt_filepath': None, 'subject': 'Joker', 'subject_id': 'Monkey N', 'date': '2024-06-06', 'run': 2}}


Now that is has been loaded, we can use it to create a dataset!

## Step 2. Loading the data
Once we've loaded a config file (or composed one in code) we can load a dataset as follows, using the config file as the parameters to the constructor. If `autoload` is `True`, then the data will automatically be loaded when this is run. Otherwise, we need to call `load_data()` on a separate line. In this config file, we've set `overwrite` to `True` so you can see the file being loaded, but if we've already created an NWB version of the zstruct, there's usually not a huge need to overwrite it.

In [16]:
data = neuraldataset.Dataset(sample_config_file)

NWB file already exists, loading


Now that we've loaded the data, we can have a look at it. Click on the arrows next to expand each module. The actual EMG and behavior data are saved in 'processing'

In [18]:
data.dataset

RuntimeError: Unable to synchronously get dataspace (identifier is not of specified type)

root pynwb.file.NWBFile at 0x2475913649840
Fields:
  acquisition: {
    ParasiteTime <class 'pynwb.base.TimeSeries'>
  }
  devices: {
    Cerebus <class 'pynwb.device.Device'>
  }
  electrode_groups: {
    Wire Electrodes <class 'pynwb.ecephys.ElectrodeGroup'>
  }
  electrodes: electrodes <class 'hdmf.common.table.DynamicTable'>
  experimenter: ['']
  file_create_date: [datetime.datetime(2025, 7, 15, 14, 37, 49, 54334, tzinfo=tzoffset(None, -14400))]
  identifier: Z:\Data\Monkeys\Joker\2024-06-06/Run-002
  institution: University of Michigan
  intervals: {
    trials <class 'pynwb.epoch.TimeIntervals'>
  }
  lab: Chestek Lab
  notes: - Date: 06/06/2024

- Goal: EMG decoding comparison with KF trained in March

- Experimenters: Matt, Jake, Aren, Maddi
- Recording Start Time: ~10:30 AM
- Recording Stop Time:  AM

- xPC Model: Rig_main_Cortical_Parasite_v2
- Recorded Array: EMG
- Data recorded: 2kSps 100-500HzBPass + broadband on cerebus, (also continuous on chans 17-96 recorded, but no s

## Step 3. Using the data
Now we can use the data. A lot of this is still

In [18]:
aa = data.dataset.processing["ecephys"]["MAV"].timestamps[:]
aa

array([2.0000e-03, 3.0000e-03, 4.0000e-03, ..., 4.0007e+01, 4.0008e+01,
       4.0009e+01], shape=(40000,))