# A1: Meta Data Files

For organizing data into NWB files, there is a lot of 'metadata' that needs to be organized and accessed. 

There are different types of metadata relating to, for example, the study, task, participant, equipment, etc. This information is a mix of general information (the same across all participants) and custom information (for a given subject), and includes elements that can be automatically parsed and added, and information that may require some manual setting. 

A potential option for managing this information is to adopt a readable file format, most likely borrowing from a common config file. Then, converting data for a new session can load a set of metadata files that are reflect information for different elements of session (for example: the study file, the task file, the site file, etc).

The desired file type is therefore something:
- machine readable (to be automatically loaded and applied during file conversion)
- human read and write-able, so that people organizing the data can edit as needed
    - if they are easy to interact with for manual elements, this hopefully minimizes errors

Potential 'config' file types:
- txt
    - Pro: simple, universal; CON: no particular structure, so parsing is annoying
- ini / cfg
    - Pro: simple, natively supported in Python, CON: I don't think it supports longer strings well
- json
    - Pro: simple, natively supported in Python; CON: a little more annoying to read / write (I think), no comments
- yaml
    - Pro: well supported, clean to write & supports longer strings, has comments; CON: not native in Python, more complex
    
The most relevant files for our purposes are probably YAML & JSON. YAML is potentially a bit better for 'human maintained' files, which might make it slightly preferred. Note that some information that needs to be listed may be relatively lengthy strings and/or lists of elements (somewhat stretching the notion of a "config" file). 

Metadata files for a particular experiment / task  will have to be initially written out, defining metadata fields and descriptions of interest. Once these files are set up, converting a new session of data should be largely automated, requiring minimal manual intervention to set custom metadata values. We do want to be flexible to metadata that may be collected / transmitted in variable ways, requiring some manual organization. This setup should allow for someone responsible for receiving & organizing data to do manual curation, as needed, and to make this easy to integrate into the pipeline. 

A possible alternative is to write in the metadata into the Python code and scripts that do the file conversion. This is what is done in the Rutishauser dataset (see [this file](https://github.com/rutishauserlab/recogmem-release-NWB/blob/master/RutishauserLabtoNWB/events/newolddelay/python/export/no2nwb.py) for example). This approach has limitations of mixing code & data in ways that are not very modular, and may make it more difficult to manually interact with metadata and customize things as needed.

In [1]:
%config Completer.use_jedi = False

In [2]:
import os
from os.path import join as pjoin

# Settings

In [3]:
# Define director where metadata files are located
metadata_folder = 'metadata'

In [4]:
# Define 
metadata_files = os.listdir(metadata_folder)
print(metadata_files)

['test_config.json', 'test_config.yaml', 'subject_info.yaml', 'test_config.cfg', 'test_ini.ini', '.ipynb_checkpoints', 'site_info.yaml', 'task_info.yaml']


In [5]:
def file_filter(files, ext):
    return [file for file in files if ext in file]

## Config Files (cfg / ini)

Note that (I think) [INI files](https://en.wikipedia.org/wiki/INI_file) and what are sometimes called given the `.cfg` extension in Python are effectively the same thing. 

These files are supported by the `configparser` module. 

Relevant pages:
- https://docs.python.org/3/library/configparser.html

In [6]:
import configparser

In [7]:
cparser = configparser.ConfigParser()

In [8]:
ini_files = file_filter(metadata_files, '.cfg')
ini_files

['test_config.cfg']

In [9]:
cparser.read(pjoin(metadata_folder, ini_files[0]))

['metadata/test_config.cfg']

In [10]:
cparser.sections()

['Recording', 'Device', 'Task']

In [11]:
for key in cparser['Recording']:
    print(key)

location
value


In [12]:
cparser['Recording']['location']

'Baylor Hospital'

## YAML

Relevant pages:
- https://www.cloudbees.com/blog/yaml-tutorial-everything-you-need-get-started/

In [13]:
import yaml

In [14]:
yaml_files = file_filter(metadata_files, '.yaml')
yaml_files

['test_config.yaml', 'subject_info.yaml', 'site_info.yaml', 'task_info.yaml']

In [15]:
with open(pjoin(metadata_folder,  yaml_files[0]), 'r') as stream:
    data = yaml.safe_load(stream)

In [16]:
type(data)

dict

In [17]:
data

{'name': 'xx',
 'description': 'This is a description of a thing that uses a whole bunch of words.',
 'keywords': ['neurosurgery', 'single-units']}

## JSON

In [18]:
import json

In [19]:
json_files = file_filter(metadata_files, '.json')
json_files

['test_config.json']

In [20]:
with open(pjoin(metadata_folder,  json_files[0]), 'r') as json_file:
    data = json.load(json_file)

In [21]:
data

{'id': 1111,
 'name': 'xx',
 'description': 'This is a description of a thing that uses a whole bunch of words.',
 'keywords': ['neurosurgery', 'single-units']}