# Meta Data Files

We have a bunch of 'metadata' we need to store. Some of it is shared across different 

I think one plausible option is to adopt a readable file format, most likely borrowing from a common config file. 

Then, a new session can
(for example: the study file, the task file, the site file, etc). 

The desired file type is therefore something:
- machine readable (to be automatically loaded and applied during file conversion)
- human read and write-able, so that people organizing the data can edit as needed
    - if they are easy to interact with for manual elements, this hopefully minimizes errors

Potential 'config' file types:
- txt
    - Pro: simple, universal; CON: no particular structure, so parsing is annoying
- ini / cfg
    - Pro: simple, natively supported in Python, CON: I don't think it supports longer strings well
- json
    - Pro: simple, natively supported in Python; CON: a little more annoying to read / write (I think), no comments
- yaml
    - Pro: well supported, clean to write & supports longer strings, has comments; CON: not native in Python, more complex
    
I think there's not too much between YAML & JSON for our purposes. YAML is potentially a bit better for 'human maintained' files, which might make it slightly preferred. 
    
Notes: 
- A possible alternative is to write in the metadata into the Python code and scripts that do the file conversion. This is effectively done by the Rutishauser dataset (see [this file](https://github.com/rutishauserlab/recogmem-release-NWB/blob/master/RutishauserLabtoNWB/events/newolddelay/python/export/no2nwb.py) for example), but I feel like this complicates the code, and mixes code & data in broadly unhelpful ways. 
- These files will have to be initially written out, defining metadata and descriptions of interest. The goal is that once these files are set up (for a particular experiment) then the goal is that updating these files for new recordings should be be plug-and-play.
- I presume some amount of meta-data comes at us in pretty variable ways, and someone has to organize it. Part of the idea here is that someone responsible for receiving data and organizing it for conversion can do manual curation, as needed, and to make this as easy and foolproof as possible. 
- We are somewhat stretching the notion of a "configuration" file, including sometimes wanting potentially relatively lengthy strings. Files that more support more text data are therefore potentially useful.

In [130]:
%config Completer.use_jedi = False

# Setup

In [131]:
import os
from os.path import join as pjoin

In [132]:
config_files = os.listdir('configs')
print(config_files)

['test_config.json', 'test_config.yaml', 'subject_info.yaml', 'test_config.cfg', 'test_ini.ini', 'site_info.yaml', 'task_info.yaml']


In [133]:
def file_filter(files, ext):
    return [file for file in files if ext in file]

## Config Files (cfg / ini)

Note that (I think) [INI files](https://en.wikipedia.org/wiki/INI_file) and what are sometimes called given the `.cfg` extension in Python are effectively the same thing. 

These files are supported by the `configparser` module. 

Relevant pages:
- https://docs.python.org/3/library/configparser.html

In [134]:
import configparser

In [135]:
config = configparser.ConfigParser()

In [136]:
ini_files = file_filter(config_files, '.cfg')
ini_files

['test_config.cfg']

In [137]:
config.read(pjoin('configs', ini_files[0]))

['configs/test_config.cfg']

In [138]:
config.sections()

['Recording', 'Device', 'Task']

In [139]:
for key in config['Recording']:
    print(key)

location
value


In [140]:
config['Recording']['location']

'Baylor Hospital'

## YAML

Relevant pages:
- https://www.cloudbees.com/blog/yaml-tutorial-everything-you-need-get-started/

In [141]:
import yaml

In [142]:
yaml_files = file_filter(config_files, '.yaml')
yaml_files

['test_config.yaml', 'subject_info.yaml', 'site_info.yaml', 'task_info.yaml']

In [143]:
with open(pjoin('configs',  yaml_files[0]), 'r') as stream:
    data = yaml.safe_load(stream)

In [144]:
type(data)

dict

In [145]:
data

{'name': 'xx',
 'description': 'This is a description of a thing that uses a whole bunch of words.',
 'keywords': ['neurosurgery', 'single-units']}

## JSON

In [146]:
import json

In [149]:
json_files = file_filter(config_files, '.json')
json_files

['test_config.json']

In [150]:
with open(pjoin('configs',  json_files[0]), 'r') as json_file:
    data = json.load(json_file)

In [151]:
data

{'id': 1111,
 'name': 'xx',
 'description': 'This is a description of a thing that uses a whole bunch of words.',
 'keywords': ['neurosurgery', 'single-units']}